Python 3
# Mimid : Inferring Grammars
* Code for subjects [here](#Our-subject-programs)
* Evaluation starts [here](#Evaluation)
* The evaluation on specific subjects starts [here](#Subjects)
* [CGIDecode](#CGIDecode)
* [Calculator](#Calculator)
* [MathExpr](#MathExpr)
* [URLParse](#URLParse)
* [Netrc](#Netrc)
* [Microjson](#Microjson)
* Results are [here](#Results)
* Recovering parse tree from a recognizer is [here](#Using-a-Recognizer-(not-a-Parser))
* Recovering parse tree from parser combinators is [here](#Parsing-with-Parser-Combinators)
* Recovering parse tree from PEG parer is [here](#Parsing-with-PEG-Parser)
Please note that a complete run can take an hour and a half to complete.
Please note that a complete run can take an hour and a half to complete.
We start with a few Jupyter magics that let us specify examples inline, that can be turned off if needed for faster execution. Switch `TOP to False` if you do not want examples to complete.
We start with a few Jupyter magics that let us specify examples
inline, that can be turned off if needed for faster execution.
Switch TOP to False if you do not want examples to
complete.
xxxxxxxxxx
TOP = __name__ == '__main__'
The magics we use are `%%var` and `%top`. The `%%var` lets us specify large strings such as file contents directly without too many escapes. The `%top` helps with examples.
The magics we use are %%var and %top.
The %%var lets us specify large strings such as file
contents directly without too many escapes. The %top
helps with examples.
xxxxxxxxxx
from IPython.core.magic import (Magics, magics_class, cell_magic, line_magic, line_cell_magic)
class B(dict):
def __getattr__(self, name):
return self.__getitem__(name)
@magics_class
class MyMagics(Magics):
def __init__(self, shell=None, **kwargs):
super().__init__(shell=shell, **kwargs)
self._vars = B()
shell.user_ns['VARS'] = self._vars
@cell_magic
def var(self, line, cell):
self._vars[line.strip()] = cell.strip()
@line_cell_magic
def top(self, line, cell=None):
if TOP:
if cell is None:
cell = line
ip = get_ipython()
res = ip.run_cell(cell)
get_ipython().register_magics(MyMagics)
xxxxxxxxxx
import sys
Parts of the program, especially the subprocess execution using `do()` requires the new flags in `3.7`. I am not sure if the taints will work on anything above.
Parts of the program, especially the subprocess execution using
do() requires the new flags in 3.7. I am
not sure if the taints will work on anything above.
xxxxxxxxxx
%top assert sys.version_info[0:2] == (3, 7)
xxxxxxxxxx
from subprocess import run
xxxxxxxxxx
import os
We keep a log of all system commands executed for easier debugging at `./build/do.log`.
We keep a log of all system commands executed for easier
debugging at ./build/do.log.
xxxxxxxxxx
def do(command, env=None, shell=False, log=False, **args):
result = run(command, universal_newlines=True, shell=shell,
env=dict(os.environ, **({} if env is None else env)),
capture_output=True, **args)
if log:
with open('build/do.log', 'a+') as f:
print(json.dumps({'cmd':command, 'env':env, 'exitcode':result.returncode}), env, file=f)
return result
xxxxxxxxxx
import random
Try to ensure replicability of measurements.
Try to ensure replicability of measurements.
xxxxxxxxxx
random.seed(0)
Note that this notebook was tested on `Debian GNU/Linux 8.10 and 9.9` and `MacOS Mojave 10.14.5`. In particular, I do not know if everything will work on `Windows`.
Note that this notebook was tested on Debian GNU/Linux
8.10 and 9.9 and MacOS Mojave 10.14.5. In
particular, I do not know if everything will work on
Windows.
xxxxxxxxxx
import shutil
xxxxxxxxxx
%%top
if shutil.which('lsb_release'):
res = do(['lsb_release', '-d']).stdout
elif shutil.which('sw_vers'):
res = do(['sw_vers']).stdout
else:
assert False
res
'ProductName:\tMac OS X\nProductVersion:\t10.14.5\nBuildVersion:\t18F132\n'
xxxxxxxxxx
%top do(['jupyter', '--version']).stdout
'jupyter core : 4.5.0\njupyter-notebook : 5.7.8\nqtconsole : 4.5.1\nipython : 7.6.1\nipykernel : 5.1.1\njupyter client : 5.3.1\njupyter lab : not installed\nnbconvert : 5.5.0\nipywidgets : 7.5.0\nnbformat : 4.4.0\ntraitlets : 4.3.2\n'
Our code is based on the utilities provided by the [Fuzzingbook](http://fuzzingbook.org). Note that the measurements on time and precision in paper were based on Fuzzingbook `0.0.7`. During the development, we found a few bugs in Autogram, which we communicated back, which resulted in a new version of Fuzzingbook `0.8.0`.
The fixed *Autogram* implementation of the *Fuzzingbook* has better precision rates for *Autogram*, and timing for grammar generation. However, these numbers still fall short of *Mimid* for most grammars. Further, the grammars generated by *Autogram* are still enumerative. That is, rather than producing a context free grammar, it simply appends input strings as alternates to the `<START>` nonterminal. This again results in bad recall numbers as before. Hence, it does not change our main points. During the remainder of this notebook, we use the `0.8.0` version of the Fuzzingbook.
Our code is based on the utilities provided by the Fuzzingbook. Note that
the measurements on time and precision in paper were based on
Fuzzingbook 0.0.7. During the development, we found a
few bugs in Autogram, which we communicated back, which resulted in
a new version of Fuzzingbook 0.8.0.
The fixed Autogram implementation of the
Fuzzingbook has better precision rates for
Autogram, and timing for grammar generation. However,
these numbers still fall short of Mimid for most grammars.
Further, the grammars generated by Autogram are still
enumerative. That is, rather than producing a context free grammar,
it simply appends input strings as alternates to the
<START> nonterminal. This again results in bad
recall numbers as before. Hence, it does not change our main
points. During the remainder of this notebook, we use the
0.8.0 version of the Fuzzingbook.
First we define `pip_install()`, a helper to silently install required dependencies.
First we define pip_install(), a helper to silently
install required dependencies.
xxxxxxxxxx
def pip_install(v):
return do(['pip', 'install', '-qqq', *v.split(' ')]).returncode
xxxxxxxxxx
%top pip_install('fuzzingbook==0.8.0')
0
Our external dependencies other than `fuzzingbook` are as follows.
Our external dependencies other than fuzzingbook
are as follows.
xxxxxxxxxx
%top pip_install('astor graphviz scipy')
0
**IMPORTANT:** Restart the jupyter server after installation of dependencies and extensions.
IMPORTANT: Restart the jupyter server after installation of dependencies and extensions.
We recommend the following jupyter notebook extensions:
We recommend the following jupyter notebook extensions:
xxxxxxxxxx
%top pip_install('jupyter_contrib_nbextensions jupyter_nbextensions_configurator')
0
xxxxxxxxxx
%top do(['jupyter','contrib','nbextension','install','--user']).returncode
0
xxxxxxxxxx
def nb_enable(v): return do(['jupyter','nbextension','enable',v]).returncode
xxxxxxxxxx
%top do(['jupyter','nbextensions_configurator','enable','--user']).returncode
0
#### Table of contents
Please install this extension. The navigation in the notebook is rather hard without this installed.
Please install this extension. The navigation in the notebook is rather hard without this installed.
xxxxxxxxxx
%top nb_enable('toc2/main')
0
#### Collapsible headings
Again, do install this extension. This will let you fold away those sections that you do not have an immediate interest in.
Again, do install this extension. This will let you fold away those sections that you do not have an immediate interest in.
xxxxxxxxxx
%top nb_enable('collapsible_headings/main')
0
#### Execute time
This is not strictly necessary, but can provide a better breakdown than `timeit` that we use for timing.
This is not strictly necessary, but can provide a better
breakdown than timeit that we use for timing.
xxxxxxxxxx
%top nb_enable('execute_time/ExecuteTime')
0
#### Code folding
Very helpful for hiding away source contents of libraries that are not for grammar recovery.
Very helpful for hiding away source contents of libraries that are not for grammar recovery.
xxxxxxxxxx
%top nb_enable('codefolding/main')
0
To make runs faster, we cache quite a lot of things. Remove `build` if you change code or samples.
To make runs faster, we cache quite a lot of things. Remove
build if you change code or samples.
xxxxxxxxxx
%top do(['rm', '-rf','build']).returncode
0
As we mentioned before `%%var` defines a multi line embedded string that is accessible from Python.
As we mentioned before %%var defines a multi line
embedded string that is accessible from Python.
xxxxxxxxxx
%%var Mimid
# [(
Testing Mimid
# )]
xxxxxxxxxx
%top VARS['Mimid']
'# [(\nTesting Mimid\n# )]'
Note that our taint tracking implementation is incomplete in that only some of the functions in Python are proxied to preserve taints. Hence, we modify source slightly where necessary to use the proxied functions without affecting the evaluation of the grammar inferencing algorithm.
Note that our taint tracking implementation is incomplete in that only some of the functions in Python are proxied to preserve taints. Hence, we modify source slightly where necessary to use the proxied functions without affecting the evaluation of the grammar inferencing algorithm.
### Calculator.py
This is a really simple calculator written in text book recursive descent style. Note that I have used `list()` in a few places to help out with taint tracking. This is due to the limitations of my taint tracking prototype. It can be fixed if required by simple AST walkers or better taint trackers.
This is a really simple calculator written in text book
recursive descent style. Note that I have used list()
in a few places to help out with taint tracking. This is due to the
limitations of my taint tracking prototype. It can be fixed if
required by simple AST walkers or better taint trackers.
xxxxxxxxxx
%%var calc_src↔
### Mathexpr.py
Originally from [here]( https://github.com/louisfisch/mathematical-expressions-parser). The mathexpr is much more complicated than our `calculator` and supports advanced functionalities such as predefined functions and variables.
xxxxxxxxxx
%%var mathexpr_src↔
### Microjson.py
The microjson is a complete pure python implementation of JSON parser, that was obtained from from [here](https://github.com/phensley/microjson). Note that we use `myio` which is an instrumented version of the original `io` to preserve taints.
xxxxxxxxxx
%%var microjson_src↔
### URLParse.py
This is the URL parser that is part of the Python distribution. The source was obtained from [here](https://github.com/python/cpython/blob/3.6/Lib/urllib/parse.py).
xxxxxxxxxx
%%var urlparse_src↔
### Netrc.py
Netrc is the initialization file that is read by web-agents such as CURL. Python ships the netrc library in its standard distribution. This was taken from [here](https://github.com/python/cpython/blob/3.6/Lib/netrc.py). Note that we use `mylex` and `myio` which corresponds to `shlex` and `io` from Python distribution, but instrumented to preserve taints.
Netrc is the initialization file that is read by web-agents such
as CURL. Python ships the netrc library in its standard
distribution. This was taken from here. Note that we use mylex and
myio which corresponds to shlex and
io from Python distribution, but instrumented to
preserve taints.
xxxxxxxxxx
%%var netrc_src↔
### CGIDecode.py
The CGIDecode is a program to decode a URL encoded string. The source for this program was obtained from [here](https://www.fuzzingbook.org/html/Coverage.html).
xxxxxxxxxx
%%var cgidecode_src↔
### Subject Registry
We store all our subject programs under `program_src`.
We store all our subject programs under
program_src.
xxxxxxxxxx
# [(
program_src = {
'calculator.py': VARS['calc_src'],
'mathexpr.py': VARS['mathexpr_src'],
'urlparse.py': VARS['urlparse_src'],
'netrc.py': VARS['netrc_src'],
'cgidecode.py': VARS['cgidecode_src'],
'microjson.py': VARS['microjson_src']
}
# )]
## Rewriting the source to track control flow and taints.
We rewrite the source so that `asring in value` gets converted to `taint_wrap__(astring).in_(value)`. Note that what we are tracking is not really taints, but rather _character accesses_ to the origin string.
We rewrite the source so that asring in value gets
converted to taint_wrap__(astring).in_(value). Note
that what we are tracking is not really taints, but rather
character accesses to the origin string.
We also rewrite the methods so that method bodies are enclosed in a `method__` context manager, any `if`conditions and `while` loops (only `while` for now) are enclosed in an outer `stack__` and inner `scope__` context manager. This lets us track when the corresponding scopes are entered into and left.
We also rewrite the methods so that method bodies are enclosed
in a method__ context manager, any
ifconditions and while loops (only
while for now) are enclosed in an outer
stack__ and inner scope__ context
manager. This lets us track when the corresponding scopes are
entered into and left.
xxxxxxxxxx
import ast
import astor
### InRewriter
The `InRewriter` class handles transforming `in` statements so that taints can be tracked. It has two methods. The `wrap()` method transforms any `a in lst` calls to `taint_wrap__(a) in lst`.
The InRewriter class handles transforming
in statements so that taints can be tracked. It has
two methods. The wrap() method transforms any a
in lst calls to taint_wrap__(a) in lst.
xxxxxxxxxx
class InRewriter(ast.NodeTransformer):
def wrap(self, node):
return ast.Call(func=ast.Name(id='taint_wrap__', ctx=ast.Load()), args=[node], keywords=[])
The `wrap()` method is internally used by `visit_Compare()` method to transform `a in lst` to `taint_wrap__(a).in_(lst)`. We need to do this because Python ties the overriding of `in` operator to the `__contains__()` method in the class of `lst`. In our case, however, very often `a` is the element tainted and hence proxied. Hence we need a method invoked on the `a` object.
The wrap() method is internally used by
visit_Compare() method to transform a in
lst to taint_wrap__(a).in_(lst). We need to do
this because Python ties the overriding of in operator
to the __contains__() method in the class of
lst. In our case, however, very often a
is the element tainted and hence proxied. Hence we need a method
invoked on the a object.
xxxxxxxxxx
class InRewriter(InRewriter):
def visit_Compare(self, tree_node):
left = tree_node.left
if not tree_node.ops or not isinstance(tree_node.ops[0], ast.In):
return tree_node
mod_val = ast.Call(
func=ast.Attribute(
value=self.wrap(left),
attr='in_'),
args=tree_node.comparators,
keywords=[])
return mod_val
Tying it together.
Tying it together.
xxxxxxxxxx
def rewrite_in(src):
v = ast.fix_missing_locations(InRewriter().visit(ast.parse(src)))
source = astor.to_source(v)
return "%s" % source
xxxxxxxxxx
from fuzzingbook.fuzzingbook_utils import print_content
xxxxxxxxxx
%top print_content(rewrite_in('s in ["a", "b", "c"]'))
taint_wrap__(s).in_(['a', 'b', 'c'])
### Rewriter
The `Rewriter` class handles inserting tracing probes into methods and control structures. Essentially, we insert a `with` scope for the method body, and a `with` scope outside both `while` and `if` scopes. Finally, we insert a `with` scope inside the `while` and `if` scopes. IMPORTANT: We only implement the `while` context. Similar should be implemented for the `for` context.
The Rewriter class handles inserting tracing probes
into methods and control structures. Essentially, we insert a
with scope for the method body, and a
with scope outside both while and
if scopes. Finally, we insert a with
scope inside the while and if scopes.
IMPORTANT: We only implement the while context.
Similar should be implemented for the for context.
A few counters to provide unique identifiers for context managers. Essentially, we number each if and while that we see.
A few counters to provide unique identifiers for context managers. Essentially, we number each if and while that we see.
xxxxxxxxxx
class Rewriter(InRewriter):
def init_counters(self):
self.if_counter = 0
self.while_counter = 0
The `methods[]` array is used to keep track of the current method stack during execution. Epsilon and NoEpsilon are simply constants that I use to indicate whether an IF or a Loop is nullable or not. If it is nullable, I mark it with Epsilon.
The methods[] array is used to keep track of the
current method stack during execution. Epsilon and NoEpsilon are
simply constants that I use to indicate whether an IF or a Loop is
nullable or not. If it is nullable, I mark it with Epsilon.
xxxxxxxxxx
methods = []
Epsilon = '-'
NoEpsilon = '='
The `wrap_in_method()` generates a wrapper for method definitions.
The wrap_in_method() generates a wrapper for method
definitions.
xxxxxxxxxx
class Rewriter(Rewriter):
def wrap_in_method(self, body, args):
method_name_expr = ast.Str(methods[-1])
my_args = ast.List(args.args, ast.Load())
args = [method_name_expr, my_args]
scope_expr = ast.Call(func=ast.Name(id='method__', ctx=ast.Load()), args=args, keywords=[])
return [ast.With(items=[ast.withitem(scope_expr, ast.Name(id='_method__'))], body=body)]
The method `visit_FunctionDef()` is the method rewriter that actually does the job.
The method visit_FunctionDef() is the method
rewriter that actually does the job.
xxxxxxxxxx
class Rewriter(Rewriter):
def visit_FunctionDef(self, tree_node):
self.init_counters()
methods.append(tree_node.name)
self.generic_visit(tree_node)
tree_node.body = self.wrap_in_method(tree_node.body, tree_node.args)
return tree_node
The `rewrite_def()` method wraps the function definitions in scopes.
The rewrite_def() method wraps the function
definitions in scopes.
xxxxxxxxxx
def rewrite_def(src):
v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
return astor.to_source(v)
We can use it as follows:
We can use it as follows:
xxxxxxxxxx
%top print_content(rewrite_def('\n'.join(program_src['calculator.py'].split('\n')[12:19])), 'calculator.py')
def parse_paren(s, i): with method__('parse_paren', [s, i]) as _method__: assert s[i] == '(' i, v = parse_expr(s, i + 1) if s[i:] == '': raise Exception(s, i) assert s[i] == ')'
#### The stack wrapper
The method `wrap_in_outer()` adds a `with ..stack..()` context _outside_ the control structures. The stack is used to keep track of the current control structure stack for any character comparison made. Notice the `can_empty` parameter. This indicates that the particular structure is _nullable_. For `if` we can make the condition right away. For `while` we postpone the decision.
The method wrap_in_outer() adds a with
..stack..() context outside the control structures.
The stack is used to keep track of the current control structure
stack for any character comparison made. Notice the
can_empty parameter. This indicates that the
particular structure is nullable. For if we
can make the condition right away. For while we
postpone the decision.
xxxxxxxxxx
class Rewriter(Rewriter):
def wrap_in_outer(self, name, can_empty, counter, node):
name_expr = ast.Str(name)
can_empty_expr = ast.Str(can_empty)
counter_expr = ast.Num(counter)
method_id = ast.Name(id='_method__')
args = [name_expr, counter_expr, method_id, can_empty_expr]
scope_expr = ast.Call(func=ast.Name(id='stack__', ctx=ast.Load()),
args=args, keywords=[])
return ast.With(
items=[ast.withitem(scope_expr, ast.Name(id='%s_%d_stack__' % (name, counter)))],
body=[node])
#### The scope wrapper
The method `wrap_in_inner()` adds a `with ...scope..()` context immediately inside the control structure. For `while`, this means simply adding one `with ...scope..()` just before the first line. For `if`, this means adding one `with ...scope...()` each to each branch of the `if` condition.
The method wrap_in_inner() adds a with
...scope..() context immediately inside the control
structure. For while, this means simply adding one
with ...scope..() just before the first line. For
if, this means adding one with
...scope...() each to each branch of the if
condition.
xxxxxxxxxx
class Rewriter(Rewriter):
def wrap_in_inner(self, name, counter, val, body):
val_expr = ast.Num(val)
stack_iter = ast.Name(id='%s_%d_stack__' % (name, counter))
method_id = ast.Name(id='_method__')
args = [val_expr, stack_iter, method_id]
scope_expr = ast.Call(func=ast.Name(id='scope__', ctx=ast.Load()),
args=args, keywords=[])
return [ast.With(
items=[ast.withitem(scope_expr, ast.Name(id='%s_%d_%d_scope__' % (name, counter, val)))],
body=body)]
#### Rewriting `If` conditions
While rewriting if conditions, we have to take care of the cascading if conditions (`elsif`), which is represented as nested if conditions in AST. They do not require separate `stack` context, but only separate `scope` contexts.
If
conditions¶While rewriting if conditions, we have to take care of the
cascading if conditions (elsif), which is represented
as nested if conditions in AST. They do not require separate
stack context, but only separate scope
contexts.
xxxxxxxxxx
class Rewriter(Rewriter):
def process_if(self, tree_node, counter, val=None):
if val is None: val = 0
else: val += 1
if_body = []
self.generic_visit(tree_node.test)
for node in tree_node.body: self.generic_visit(node)
tree_node.body = self.wrap_in_inner('if', counter, val, tree_node.body)
# else part.
if len(tree_node.orelse) == 1 and isinstance(tree_node.orelse[0], ast.If):
self.process_if(tree_node.orelse[0], counter, val)
else:
if tree_node.orelse:
val += 1
for node in tree_node.orelse: self.generic_visit(node)
tree_node.orelse = self.wrap_in_inner('if', counter, val, tree_node.orelse)
While rewriting `if` conditions, we have to take care of the cascading `if` conditions, which is represented as nested `if` conditions in AST. We need to identify whether the cascading `if` conditions (`elsif`) have an empty `orelse` clause or not. If it has an empty `orelse`, then the entire set of `if` conditions may be excised, and still produce a valid value. Hence, it should be marked as optional. The `visit_If()` checks if the cascading `ifs` have an `orelse` or not.
While rewriting if conditions, we have to take care
of the cascading if conditions, which is represented
as nested if conditions in AST. We need to identify
whether the cascading if conditions
(elsif) have an empty orelse clause or
not. If it has an empty orelse, then the entire set of
if conditions may be excised, and still produce a
valid value. Hence, it should be marked as optional. The
visit_If() checks if the cascading ifs
have an orelse or not.
xxxxxxxxxx
class Rewriter(Rewriter):
def visit_If(self, tree_node):
self.if_counter += 1
counter = self.if_counter
#is it empty
start = tree_node
while start:
if isinstance(start, ast.If):
if not start.orelse:
start = None
elif len(start.orelse) == 1:
start = start.orelse[0]
else:
break
else:
break
self.process_if(tree_node, counter=self.if_counter)
can_empty = NoEpsilon if start else Epsilon # NoEpsilon for + and Epsilon for *
return self.wrap_in_outer('if', can_empty, counter, tree_node)
#### Rewriting `while` loops
Rewriting while loops are simple. We wrap them in `stack` and `scope` contexts. We do not implement the `orelse` feature yet.
while loops¶Rewriting while loops are simple. We wrap them in
stack and scope contexts. We do not
implement the orelse feature yet.
xxxxxxxxxx
class Rewriter(Rewriter):
def visit_While(self, tree_node):
self.generic_visit(tree_node)
self.while_counter += 1
counter = self.while_counter
test = tree_node.test
body = tree_node.body
assert not tree_node.orelse
tree_node.body = self.wrap_in_inner('while', counter, 0, body)
return self.wrap_in_outer('while', '?', counter, tree_node)
xxxxxxxxxx
def rewrite_cf(src, original):
v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
return astor.to_source(v)
An example with `if` conditions.
An example with if conditions.
xxxxxxxxxx
%top print_content('\n'.join(program_src['calculator.py'].split('\n')[12:19]), 'calculator.py')
def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Exception(s, i) assert s[i] == ')'
xxxxxxxxxx
%top print_content(rewrite_cf('\n'.join(program_src['calculator.py'].split('\n')[12:19]), 'calculator.py').strip(), filename='calculator.py')
def parse_paren(s, i): with method__('parse_paren', [s, i]) as _method__: assert s[i] == '(' i, v = parse_expr(s, i + 1) with stack__('if', 1, _method__, '-') as if_1_stack__: if s[i:] == '': with scope__(0, if_1_stack__, _method__) as if_1_0_scope__: raise Exception(s, i) assert s[i] == ')'
An example with `while` loops.
An example with while loops.
xxxxxxxxxx
%top print_content('\n'.join(program_src['calculator.py'].split('\n')[5:11]), 'calculator.py')
def parse_num(s,i):
n = ''
while s[i:] and is_digit(s[i]):
n += s[i]
i = i +1
xxxxxxxxxx
%top print_content(rewrite_cf('\n'.join(program_src['calculator.py'].split('\n')[5:11]), 'calculator.py'), filename='calculator.py')
def parse_num(s, i): with method__('parse_num', [s, i]) as _method__: n = '' with stack__('while', 1, _method__, '?') as while_1_stack__: while s[i:] and is_digit(s[i]): with scope__(0, while_1_stack__, _method__ ) as while_1_0_scope__: n += s[i] i = i + 1
#### Generating the complete instrumented source
For the complete instrumented source, we need to first make sure that all necessary imports are satisfied. Next, we also need to invoke the parser with the necessary tainted input and output the trace.
For the complete instrumented source, we need to first make sure that all necessary imports are satisfied. Next, we also need to invoke the parser with the necessary tainted input and output the trace.
xxxxxxxxxx
def rewrite(src, original):
src = ast.fix_missing_locations(InRewriter().visit(ast.parse(src)))
v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
header = """
from mimid_context import scope__, stack__, method__
import json
import sys
import taints
from taints import taint_wrap__
"""
source = astor.to_source(v)
footer = """
if __name__ == "__main__":
js = []
for arg in sys.argv[1:]:
with open(arg) as f:
mystring = f.read().strip().replace('\\n', ' ')
taints.trace_init()
tainted_input = taints.wrap_input(mystring)
main(tainted_input)
assert tainted_input.comparisons
j = {
'comparisons_fmt': 'idx, char, method_call_id',
'comparisons':taints.convert_comparisons(tainted_input.comparisons, mystring),
'method_map_fmt': 'method_call_id, method_name, children',
'method_map': taints.convert_method_map(taints.METHOD_MAP),
'inputstr': mystring,
'original': %s,
'arg': arg}
js.append(j)
print(json.dumps(js))
"""
footer = footer % repr(original)
return "%s\n%s\n%s" % (header, source, footer)
xxxxxxxxxx
%top calc_parse_rewritten = rewrite(program_src['calculator.py'], original='calculator.py')
xxxxxxxxxx
%top print_content(calc_parse_rewritten, filename='calculator.py')
from mimid_context import scope__, stack__, method__ import json import sys import taints from taints import taint_wrap__ import string def is_digit(i): with method__('is_digit', [i]) as _method__: return taint_wrap__(i).in_(list(string.digits)) def parse_num(s, i): with method__('parse_num', [s, i]) as _method__: n = '' with stack__('while', 1, _method__, '?') as while_1_stack__: while s[i:] and is_digit(s[i]): with scope__(0, while_1_stack__, _method__ ) as while_1_0_scope__: n += s[i] i = i + 1 return i, n def parse_paren(s, i): with method__('parse_paren', [s, i]) as _method__: assert s[i] == '(' i, v = parse_expr(s, i + 1) with stack__('if', 1, _method__, '-') as if_1_stack__: if s[i:] == '': with scope__(0, if_1_stack__, _method__) as if_1_0_scope__: raise Exception(s, i) assert s[i] == ')' return i + 1, v def parse_expr(s, i=0): with method__('parse_expr', [s, i]) as _method__: expr = [] is_op = True with stack__('while', 1, _method__, '?') as while_1_stack__: while s[i:]: with scope__(0, while_1_stack__, _method__ ) as while_1_0_scope__: c = s[i] with stack__('if', 1, _method__, '=') as if_1_stack__: if taint_wrap__(c).in_(list(string.digits)): with scope__(0, if_1_stack__, _method__ ) as if_1_0_scope__: if not is_op: raise Exception(s, i) i, num = parse_num(s, i) expr.append(num) is_op = False elif taint_wrap__(c).in_(['+', '-', '*', '/']): with scope__(1, if_1_stack__, _method__ ) as if_1_1_scope__: if is_op: raise Exception(s, i) expr.append(c) is_op = True i = i + 1 elif c == '(': with scope__(2, if_1_stack__, _method__ ) as if_1_2_scope__: if not is_op: raise Exception(s, i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': with scope__(3, if_1_stack__, _method__ ) as if_1_3_scope__: break else: with scope__(4, if_1_stack__, _method__ ) as if_1_4_scope__: raise Exception(s, i) with stack__('if', 2, _method__, '-') as if_2_stack__: if is_op: with scope__(0, if_2_stack__, _method__) as if_2_0_scope__: raise Exception(s, i) return i, expr def main(arg): with method__('main', [arg]) as _method__: return parse_expr(arg) if __name__ == "__main__": js = [] for arg in sys.argv[1:]: with open(arg) as f: mystring = f.read().strip().replace('\n', ' ') taints.trace_init() tainted_input = taints.wrap_input(mystring) main(tainted_input) assert tainted_input.comparisons j = { 'comparisons_fmt': 'idx, char, method_call_id', 'comparisons':taints.convert_comparisons(tainted_input.comparisons, mystring), 'method_map_fmt': 'method_call_id, method_name, children', 'method_map': taints.convert_method_map(taints.METHOD_MAP), 'inputstr': mystring, 'original': 'calculator.py', 'arg': arg} js.append(j) print(json.dumps(js))
We will now write the transformed sources.
We will now write the transformed sources.
xxxxxxxxxx
do(['mkdir','-p','build','subjects','samples']).returncode
0
xxxxxxxxxx
# [(
for file_name in program_src:
print(file_name)
with open("subjects/%s" % file_name, 'wb+') as f:
f.write(program_src[file_name].encode('utf-8'))
with open("build/%s" % file_name, 'w+') as f:
f.write(rewrite(program_src[file_name], file_name))
# )]
calculator.py mathexpr.py urlparse.py netrc.py cgidecode.py microjson.py
### Context Mangers
The context managers are probes inserted into the source code so that we know when execution enters and exits specific control flow structures such as conditionals and loops. Note that source code for these probes are not really a requirement. They can be inserted directly on binaries too, or even dynamically inserted using tools such as `dtrace`. For now, we make our life simple using AST editing.
The context managers are probes inserted into the source code so
that we know when execution enters and exits specific control flow
structures such as conditionals and loops. Note that source code
for these probes are not really a requirement. They can be inserted
directly on binaries too, or even dynamically inserted using tools
such as dtrace. For now, we make our life simple using
AST editing.
#### Method context
The `method__` context handles the assignment of method name, as well as storing the method stack.
The method__ context handles the assignment of
method name, as well as storing the method stack.
xxxxxxxxxx
%%var mimid_method_context
# [(
import taints
import urllib.parse
def to_key(method, name, num):
return '%s:%s_%s' % (method, name, num)
class method__:
def __init__(self, name, args):
if not taints.METHOD_NUM_STACK: return
self.args = '_'.join([urllib.parse.quote(i) for i in args if type(i) == str])
if not self.args:
self.name = name
else:
self.name = "%s__%s" % (name, self.args) # <- not for now #TODO
if args and hasattr(args[0], 'tag'):
self.name = "%s:%s" % (args[0].tag, self.name)
taints.trace_call(self.name)
def __enter__(self):
if not taints.METHOD_NUM_STACK: return
taints.trace_set_method(self.name)
self.stack = []
return self
def __exit__(self, *args):
if not taints.METHOD_NUM_STACK: return
taints.trace_return()
taints.trace_set_method(self.name)
# )]
The stack context stores the current prefix and handles updating the stack that is stored at the method context.
The stack context stores the current prefix and handles updating the stack that is stored at the method context.
xxxxxxxxxx
%%var mimid_stack_context
# [(
class stack__:
def __init__(self, name, num, method_i, can_empty):
if not taints.METHOD_NUM_STACK: return
self.method_stack = method_i.stack
self.can_empty = can_empty # * means yes. + means no, ? means to be determined
self.name, self.num, self.method = name, num, method_i.name
self.prefix = to_key(self.method, self.name, self.num)
def __enter__(self):
if not taints.METHOD_NUM_STACK: return
if self.name in {'while'}:
self.method_stack.append(0)
elif self.name in {'if'}:
self.method_stack.append(-1)
else:
assert False
return self
def __exit__(self, *args):
if not taints.METHOD_NUM_STACK: return
self.method_stack.pop()
# )]
#### Scope context
The scope context correctly identifies when the control structure is entered into, and exited (in case of loops) and the alternative entered int (in case of if conditions).
The scope context correctly identifies when the control structure is entered into, and exited (in case of loops) and the alternative entered int (in case of if conditions).
xxxxxxxxxx
%%var mimid_scope_context
# [(
import json
class scope__:
def __init__(self, alt, stack_i, method_i):
if not taints.METHOD_NUM_STACK: return
self.name, self.num, self.method, self.alt = stack_i.name, stack_i.num, stack_i.method, alt
self.method_stack = method_i.stack
self.can_empty = stack_i.can_empty
def __enter__(self):
if not taints.METHOD_NUM_STACK: return
if self.name in {'while'}:
self.method_stack[-1] += 1
elif self.name in {'if'}:
pass
else:
assert False, self.name
uid = json.dumps(self.method_stack)
if self.name in {'while'}:
taints.trace_call('%s:%s_%s %s %s' % (self.method, self.name, self.num, self.can_empty, uid))
else:
taints.trace_call('%s:%s_%s %s %s#%s' % (self.method, self.name, self.num, self.can_empty, self.alt, uid))
taints.trace_set_method(self.name)
return self
def __exit__(self, *args):
if not taints.METHOD_NUM_STACK: return
taints.trace_return()
taints.trace_set_method(self.name)
# )]
### Taint Tracker
The taint tracker is essentially a reimplementation of the information flow taints from the Fuzzingbook. It incorporates tracing of character accesses. IMPORTANT: Not all methods are implemented.
The taint tracker is essentially a reimplementation of the information flow taints from the Fuzzingbook. It incorporates tracing of character accesses. IMPORTANT: Not all methods are implemented.
xxxxxxxxxx
%%var taints_src↔
We write both files to the appropriate locations.
We write both files to the appropriate locations.
xxxxxxxxxx
# [(
with open('build/mimid_context.py', 'w+') as f:
print(VARS['mimid_method_context'], file=f)
print(VARS['mimid_stack_context'], file=f)
print(VARS['mimid_scope_context'], file=f)
with open('build/taints.py', 'w+') as f:
print(VARS['taints_src'], file=f)
# )]
Here is how one can generate traces for the `calc` program.
Here is how one can generate traces for the calc
program.
xxxxxxxxxx
%top do(['mkdir','-p','samples/calc']).returncode
0
xxxxxxxxxx
%top do(['mkdir','-p','samples/mathexpr']).returncode
0
xxxxxxxxxx
%%top
# [(
with open('samples/calc/0.csv', 'w+') as f:
print('9-(16+72)*3/458', file=f)
with open('samples/calc/1.csv', 'w+') as f:
print('(9)+3/4/58', file=f)
with open('samples/calc/2.csv', 'w+') as f:
print('8*3/40', file=f)
# )]
Generating traces on `mathexpr`.
Generating traces on mathexpr.
xxxxxxxxxx
%%top
# [(
with open('samples/mathexpr/0.csv', 'w+') as f:
print('100', file=f)
with open('samples/mathexpr/1.csv', 'w+') as f:
print('2 + 3', file=f)
with open('samples/mathexpr/2.csv', 'w+') as f:
print('4 * 5', file=f)
# )]
xxxxxxxxxx
%top calc_trace_out = do("python build/calculator.py samples/calc/*.csv", shell=True).stdout
xxxxxxxxxx
%top mathexpr_trace_out = do("python build/mathexpr.py samples/mathexpr/*.csv", shell=True).stdout
xxxxxxxxxx
import json
xxxxxxxxxx
%top calc_trace = json.loads(calc_trace_out)
xxxxxxxxxx
%top mathexpr_trace = json.loads(mathexpr_trace_out)
### Reconstructing the Method Tree with Attached Character Comparisons
Reconstruct the actual method trace from a trace with the following
format
```
key : [ mid, method_name, children_ids ]
```
Reconstruct the actual method trace from a trace with the following format
key : [ mid, method_name, children_ids ]
xxxxxxxxxx
def reconstruct_method_tree(method_map):
first_id = None
tree_map = {}
for key in method_map:
m_id, m_name, m_children = method_map[key]
children = []
if m_id in tree_map:
# just update the name and children
assert not tree_map[m_id]
tree_map[m_id]['id'] = m_id
tree_map[m_id]['name'] = m_name
tree_map[m_id]['indexes'] = []
tree_map[m_id]['children'] = children
else:
assert first_id is None
tree_map[m_id] = {'id': m_id, 'name': m_name, 'children': children, 'indexes': []}
first_id = m_id
for c in m_children:
assert c not in tree_map
val = {}
tree_map[c] = val
children.append(val)
return first_id, tree_map
Here is how one would use it. The first element in the returned tuple is the id of the bottom most method call.
Here is how one would use it. The first element in the returned tuple is the id of the bottom most method call.
xxxxxxxxxx
from fuzzingbook.GrammarFuzzer import display_tree
xxxxxxxxxx
%top first, calc_method_tree1 = reconstruct_method_tree(calc_trace[0]['method_map'])
xxxxxxxxxx
%top first, mathexpr_method_tree1 = reconstruct_method_tree(mathexpr_trace[0]['method_map'])
xxxxxxxxxx
def extract_node(node, id):
symbol = str(node['id'])
children = node['children']
annotation = str(node['name'])
return "%s:%s" % (symbol, annotation), children, ''
xxxxxxxxxx
%top v = display_tree(calc_method_tree1[0], extract_node=extract_node)
xxxxxxxxxx
from IPython.display import Image
xxxxxxxxxx
def zoom(v, zoom=True):
# return v directly if you do not want to zoom out.
if zoom:
return Image(v.render(format='png'))
return v
xxxxxxxxxx
%top zoom(v)
xxxxxxxxxx
%top zoom(display_tree(mathexpr_method_tree1[0], extract_node=extract_node))
#### Identifying last comparisons
We need only the last comparisons made on any index. This means that we should care for only the last parse in an ambiguous parse. However, to make concessions for real world, we also check if we are overwriting a child (`HEURISTIC`). Note that `URLParser` is the only parser that needs this heuristic.
We need only the last comparisons made on any index. This means
that we should care for only the last parse in an ambiguous parse.
However, to make concessions for real world, we also check if we
are overwriting a child (HEURISTIC). Note that
URLParser is the only parser that needs this
heuristic.
xxxxxxxxxx
def last_comparisons(comparisons):
HEURISTIC = True
last_cmp_only = {}
last_idx = {}
# get the last indexes compared in methods.
for idx, char, mid in comparisons:
if mid in last_idx:
if idx > last_idx[mid]:
last_idx[mid] = idx
else:
last_idx[mid] = idx
for idx, char, mid in comparisons:
if HEURISTIC:
if idx in last_cmp_only:
if last_cmp_only[idx] > mid:
# do not clobber children unless it was the last character
# for that child.
if last_idx[mid] > idx:
# if it was the last index, may be the child used it
# as a boundary check.
continue
last_cmp_only[idx] = mid
return last_cmp_only
Here is how one would use it.
Here is how one would use it.
xxxxxxxxxx
%top calc_last_comparisons1 = last_comparisons(calc_trace[0]['comparisons'])
xxxxxxxxxx
%top calc_last_comparisons1
{0: 6,
1: 9,
2: 13,
3: 18,
4: 20,
5: 23,
6: 28,
7: 30,
8: 13,
9: 35,
10: 40,
11: 43,
12: 48,
13: 50,
14: 52}
xxxxxxxxxx
%top mathexpr_last_comparisons1 = last_comparisons(mathexpr_trace[0]['comparisons'])
xxxxxxxxxx
%top mathexpr_last_comparisons1
{0: 38, 1: 42, 2: 46}
#### Attaching characters to the tree
Add the comparison indexes to the method tree that we constructed
Add the comparison indexes to the method tree that we constructed
xxxxxxxxxx
def attach_comparisons(method_tree, comparisons):
for idx in comparisons:
mid = comparisons[idx]
method_tree[mid]['indexes'].append(idx)
Here is how one would use it. Note which method call each input index is associated. For example, the first index is associated with method call id: 6, which corresponds to `is_digit`.
Here is how one would use it. Note which method call each input
index is associated. For example, the first index is associated
with method call id: 6, which corresponds to
is_digit.
xxxxxxxxxx
%top attach_comparisons(calc_method_tree1, calc_last_comparisons1)
xxxxxxxxxx
%top calc_method_tree1
{0: {'id': 0,
'name': None,
'children': [{'id': 1,
'name': 'main',
'indexes': [],
'children': [{'id': 2,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 3,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 4,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6,
'name': 'is_digit',
'indexes': [0],
'children': []},
{'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 8,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 9,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 10,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 11,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 12,
'name': 'parse_expr:if_1 = 2#[3, -1]',
'indexes': [],
'children': [{'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20,
'name': 'is_digit',
'indexes': [4],
'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30,
'name': 'is_digit',
'indexes': [7],
'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]}]}]},
{'id': 35,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [9],
'children': [{'id': 36,
'name': 'parse_expr:if_1 = 1#[4, -1]',
'indexes': [],
'children': []}]},
{'id': 37,
'name': 'parse_expr:while_1 ? [5]',
'indexes': [],
'children': [{'id': 38,
'name': 'parse_expr:if_1 = 0#[5, -1]',
'indexes': [],
'children': [{'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40,
'name': 'is_digit',
'indexes': [10],
'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 43,
'name': 'parse_expr:while_1 ? [6]',
'indexes': [11],
'children': [{'id': 44,
'name': 'parse_expr:if_1 = 1#[6, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'parse_expr:while_1 ? [7]',
'indexes': [],
'children': [{'id': 46,
'name': 'parse_expr:if_1 = 0#[7, -1]',
'indexes': [],
'children': [{'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48,
'name': 'is_digit',
'indexes': [12],
'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]}]}]}]}]}],
'indexes': []},
1: {'id': 1,
'name': 'main',
'indexes': [],
'children': [{'id': 2,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 3,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 4,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6,
'name': 'is_digit',
'indexes': [0],
'children': []},
{'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 9,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 10,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 11,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 12,
'name': 'parse_expr:if_1 = 2#[3, -1]',
'indexes': [],
'children': [{'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20,
'name': 'is_digit',
'indexes': [4],
'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30,
'name': 'is_digit',
'indexes': [7],
'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]}]}]},
{'id': 35,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [9],
'children': [{'id': 36,
'name': 'parse_expr:if_1 = 1#[4, -1]',
'indexes': [],
'children': []}]},
{'id': 37,
'name': 'parse_expr:while_1 ? [5]',
'indexes': [],
'children': [{'id': 38,
'name': 'parse_expr:if_1 = 0#[5, -1]',
'indexes': [],
'children': [{'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40,
'name': 'is_digit',
'indexes': [10],
'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 43,
'name': 'parse_expr:while_1 ? [6]',
'indexes': [11],
'children': [{'id': 44,
'name': 'parse_expr:if_1 = 1#[6, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'parse_expr:while_1 ? [7]',
'indexes': [],
'children': [{'id': 46,
'name': 'parse_expr:if_1 = 0#[7, -1]',
'indexes': [],
'children': [{'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48,
'name': 'is_digit',
'indexes': [12],
'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]}]}]}]}]},
2: {'id': 2,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 3,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 4,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6,
'name': 'is_digit',
'indexes': [0],
'children': []},
{'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 9,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 10,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 11,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 12,
'name': 'parse_expr:if_1 = 2#[3, -1]',
'indexes': [],
'children': [{'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20,
'name': 'is_digit',
'indexes': [4],
'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30,
'name': 'is_digit',
'indexes': [7],
'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]}]}]},
{'id': 35,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [9],
'children': [{'id': 36,
'name': 'parse_expr:if_1 = 1#[4, -1]',
'indexes': [],
'children': []}]},
{'id': 37,
'name': 'parse_expr:while_1 ? [5]',
'indexes': [],
'children': [{'id': 38,
'name': 'parse_expr:if_1 = 0#[5, -1]',
'indexes': [],
'children': [{'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40,
'name': 'is_digit',
'indexes': [10],
'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 43,
'name': 'parse_expr:while_1 ? [6]',
'indexes': [11],
'children': [{'id': 44,
'name': 'parse_expr:if_1 = 1#[6, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'parse_expr:while_1 ? [7]',
'indexes': [],
'children': [{'id': 46,
'name': 'parse_expr:if_1 = 0#[7, -1]',
'indexes': [],
'children': [{'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48,
'name': 'is_digit',
'indexes': [12],
'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]}]}]}]},
3: {'id': 3,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 4,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6,
'name': 'is_digit',
'indexes': [0],
'children': []},
{'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
9: {'id': 9,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 10,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
11: {'id': 11,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 12,
'name': 'parse_expr:if_1 = 2#[3, -1]',
'indexes': [],
'children': [{'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]}]}]},
35: {'id': 35,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [9],
'children': [{'id': 36,
'name': 'parse_expr:if_1 = 1#[4, -1]',
'indexes': [],
'children': []}]},
37: {'id': 37,
'name': 'parse_expr:while_1 ? [5]',
'indexes': [],
'children': [{'id': 38,
'name': 'parse_expr:if_1 = 0#[5, -1]',
'indexes': [],
'children': [{'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40,
'name': 'is_digit',
'indexes': [10],
'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
43: {'id': 43,
'name': 'parse_expr:while_1 ? [6]',
'indexes': [11],
'children': [{'id': 44,
'name': 'parse_expr:if_1 = 1#[6, -1]',
'indexes': [],
'children': []}]},
45: {'id': 45,
'name': 'parse_expr:while_1 ? [7]',
'indexes': [],
'children': [{'id': 46,
'name': 'parse_expr:if_1 = 0#[7, -1]',
'indexes': [],
'children': [{'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48,
'name': 'is_digit',
'indexes': [12],
'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]}]}]},
4: {'id': 4,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
{'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
5: {'id': 5,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
{'id': 7, 'name': 'parse_num:while_1 ? [1]', 'indexes': [], 'children': []},
{'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]},
6: {'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
7: {'id': 7,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
8: {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []},
10: {'id': 10,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []},
12: {'id': 12,
'name': 'parse_expr:if_1 = 2#[3, -1]',
'indexes': [],
'children': [{'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32,
'name': 'is_digit',
'indexes': [],
'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]}]},
13: {'id': 13,
'name': 'parse_paren',
'indexes': [2, 8],
'children': [{'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]}]},
14: {'id': 14,
'name': 'parse_expr',
'indexes': [],
'children': [{'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
{'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]}]},
15: {'id': 15,
'name': 'parse_expr:while_1 ? [1]',
'indexes': [],
'children': [{'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
23: {'id': 23,
'name': 'parse_expr:while_1 ? [2]',
'indexes': [5],
'children': [{'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
25: {'id': 25,
'name': 'parse_expr:while_1 ? [3]',
'indexes': [],
'children': [{'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
33: {'id': 33,
'name': 'parse_expr:while_1 ? [4]',
'indexes': [],
'children': [{'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []}]},
16: {'id': 16,
'name': 'parse_expr:if_1 = 0#[1, -1]',
'indexes': [],
'children': [{'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18,
'name': 'is_digit',
'indexes': [3],
'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
17: {'id': 17,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 18, 'name': 'is_digit', 'indexes': [3], 'children': []},
{'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
{'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]},
18: {'id': 18, 'name': 'is_digit', 'indexes': [3], 'children': []},
19: {'id': 19,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
20: {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
21: {'id': 21,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
22: {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []},
24: {'id': 24,
'name': 'parse_expr:if_1 = 1#[2, -1]',
'indexes': [],
'children': []},
26: {'id': 26,
'name': 'parse_expr:if_1 = 0#[3, -1]',
'indexes': [],
'children': [{'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28,
'name': 'is_digit',
'indexes': [6],
'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
27: {'id': 27,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 28, 'name': 'is_digit', 'indexes': [6], 'children': []},
{'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
{'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]},
28: {'id': 28, 'name': 'is_digit', 'indexes': [6], 'children': []},
29: {'id': 29,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
30: {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
31: {'id': 31,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
32: {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []},
34: {'id': 34,
'name': 'parse_expr:if_1 = 3#[4, -1]',
'indexes': [],
'children': []},
36: {'id': 36,
'name': 'parse_expr:if_1 = 1#[4, -1]',
'indexes': [],
'children': []},
38: {'id': 38,
'name': 'parse_expr:if_1 = 0#[5, -1]',
'indexes': [],
'children': [{'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40,
'name': 'is_digit',
'indexes': [10],
'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
39: {'id': 39,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 40, 'name': 'is_digit', 'indexes': [10], 'children': []},
{'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]},
40: {'id': 40, 'name': 'is_digit', 'indexes': [10], 'children': []},
41: {'id': 41,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
42: {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []},
44: {'id': 44,
'name': 'parse_expr:if_1 = 1#[6, -1]',
'indexes': [],
'children': []},
46: {'id': 46,
'name': 'parse_expr:if_1 = 0#[7, -1]',
'indexes': [],
'children': [{'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48,
'name': 'is_digit',
'indexes': [12],
'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]}]},
47: {'id': 47,
'name': 'parse_num',
'indexes': [],
'children': [{'id': 48, 'name': 'is_digit', 'indexes': [12], 'children': []},
{'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
{'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
{'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
{'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
{'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}]},
48: {'id': 48, 'name': 'is_digit', 'indexes': [12], 'children': []},
49: {'id': 49,
'name': 'parse_num:while_1 ? [1]',
'indexes': [],
'children': []},
50: {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
51: {'id': 51,
'name': 'parse_num:while_1 ? [2]',
'indexes': [],
'children': []},
52: {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
53: {'id': 53,
'name': 'parse_num:while_1 ? [3]',
'indexes': [],
'children': []}}
xxxxxxxxxx
%top attach_comparisons(mathexpr_method_tree1, mathexpr_last_comparisons1)
xxxxxxxxxx
%top mathexpr_method_tree1
{0: {'id': 0,
'name': None,
'children': [{'id': 1,
'name': 'main',
'indexes': [],
'children': [{'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
{'id': 3,
'name': 'getValue',
'indexes': [],
'children': [{'id': 4,
'name': 'parseExpression',
'indexes': [],
'children': [{'id': 5,
'name': 'parseAddition',
'indexes': [],
'children': [{'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]}]},
{'id': 60,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 61,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}],
'indexes': []},
1: {'id': 1,
'name': 'main',
'indexes': [],
'children': [{'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
{'id': 3,
'name': 'getValue',
'indexes': [],
'children': [{'id': 4,
'name': 'parseExpression',
'indexes': [],
'children': [{'id': 5,
'name': 'parseAddition',
'indexes': [],
'children': [{'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]}]},
{'id': 60,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 61,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]}]},
2: {'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
3: {'id': 3,
'name': 'getValue',
'indexes': [],
'children': [{'id': 4,
'name': 'parseExpression',
'indexes': [],
'children': [{'id': 5,
'name': 'parseAddition',
'indexes': [],
'children': [{'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]}]},
{'id': 60,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 61,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]},
4: {'id': 4,
'name': 'parseExpression',
'indexes': [],
'children': [{'id': 5,
'name': 'parseAddition',
'indexes': [],
'children': [{'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]}]},
60: {'id': 60,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 61, 'name': 'hasNext', 'indexes': [], 'children': []}]},
62: {'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []},
5: {'id': 5,
'name': 'parseAddition',
'indexes': [],
'children': [{'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
6: {'id': 6,
'name': 'parseMultiplication',
'indexes': [],
'children': [{'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
{'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]}]},
55: {'id': 55,
'name': 'parseAddition:while_1 ? [1]',
'indexes': [],
'children': [{'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]},
7: {'id': 7,
'name': 'parseParenthesis',
'indexes': [],
'children': [{'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]}]},
50: {'id': 50,
'name': 'parseMultiplication:while_1 ? [1]',
'indexes': [],
'children': [{'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52,
'name': 'hasNext',
'indexes': [],
'children': []}]},
{'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []}]},
8: {'id': 8,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
13: {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
14: {'id': 14,
'name': 'parseParenthesis:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]}]},
9: {'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
10: {'id': 10,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
11: {'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
12: {'id': 12,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []},
15: {'id': 15,
'name': 'parseNegative',
'indexes': [],
'children': [{'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49,
'name': 'hasNext',
'indexes': [],
'children': []}]}]}]}]}]},
16: {'id': 16,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
21: {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
22: {'id': 22,
'name': 'parseNegative:if_1 = 1#[-1]',
'indexes': [],
'children': [{'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}]}]},
17: {'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
18: {'id': 18,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
19: {'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
20: {'id': 20,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []},
23: {'id': 23,
'name': 'parseValue',
'indexes': [],
'children': [{'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35,
'name': 'peek',
'indexes': [],
'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}]},
24: {'id': 24,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
29: {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
30: {'id': 30,
'name': 'parseValue:if_1 = 0#[-1]',
'indexes': [],
'children': [{'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33,
'name': 'hasNext',
'indexes': [],
'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]},
25: {'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
26: {'id': 26,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
27: {'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
28: {'id': 28,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []},
31: {'id': 31,
'name': 'parseNumber',
'indexes': [],
'children': [{'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
{'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
{'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
{'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
{'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]},
32: {'id': 32,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
{'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]}]},
37: {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
38: {'id': 38,
'name': 'parseNumber:while_1 ? [1]',
'indexes': [0],
'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
41: {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
42: {'id': 42,
'name': 'parseNumber:while_1 ? [2]',
'indexes': [1],
'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []}]},
45: {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
46: {'id': 46,
'name': 'parseNumber:while_1 ? [3]',
'indexes': [2],
'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []}]},
49: {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []},
33: {'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
34: {'id': 34,
'name': 'skipWhitespace:while_1 ? [1]',
'indexes': [],
'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
{'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []}]},
35: {'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
36: {'id': 36,
'name': 'skipWhitespace:if_1 = 1#[1, -1]',
'indexes': [],
'children': []},
39: {'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
40: {'id': 40,
'name': 'parseNumber:if_1 = 1#[1, -1]',
'indexes': [],
'children': []},
43: {'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
44: {'id': 44,
'name': 'parseNumber:if_1 = 1#[2, -1]',
'indexes': [],
'children': []},
47: {'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
48: {'id': 48,
'name': 'parseNumber:if_1 = 1#[3, -1]',
'indexes': [],
'children': []},
51: {'id': 51,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 52, 'name': 'hasNext', 'indexes': [], 'children': []}]},
53: {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
54: {'id': 54,
'name': 'parseMultiplication:if_1 = 2#[1, -1]',
'indexes': [],
'children': []},
52: {'id': 52, 'name': 'hasNext', 'indexes': [], 'children': []},
56: {'id': 56,
'name': 'skipWhitespace',
'indexes': [],
'children': [{'id': 57, 'name': 'hasNext', 'indexes': [], 'children': []}]},
58: {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
59: {'id': 59,
'name': 'parseAddition:if_1 = 2#[1, -1]',
'indexes': [],
'children': []},
57: {'id': 57, 'name': 'hasNext', 'indexes': [], 'children': []},
61: {'id': 61, 'name': 'hasNext', 'indexes': [], 'children': []}}
xxxxxxxxxx
def wrap_input(istr):
def extract_node(node, id):
symbol = str(node['id'])
children = node['children']
annotation = str(node['name'])
indexes = repr(tuple([istr[i] for i in node['indexes']]))
return "%s %s" % (annotation, indexes), children, ''
return extract_node
xxxxxxxxxx
%top extract_node1 = wrap_input(calc_trace[0]['inputstr'])
xxxxxxxxxx
%top zoom(display_tree(calc_method_tree1[0], extract_node=extract_node1))
xxxxxxxxxx
%top extract_node1 = wrap_input(mathexpr_trace[0]['inputstr'])
xxxxxxxxxx
%top zoom(display_tree(mathexpr_method_tree1[0], extract_node=extract_node1))
We define `to_node()` a convenience function that, given a list of _contiguous_ indexes and original string, translates it to a leaf node of a tree (that corresponds to the derivation tree syntax in the Fuzzingbook) with a string, empty children, and starting node and ending node.
We define to_node() a convenience function that,
given a list of contiguous indexes and original string,
translates it to a leaf node of a tree (that corresponds to the
derivation tree syntax in the Fuzzingbook) with a string, empty
children, and starting node and ending node.
Convert a list of indexes to a corresponding terminal tree node
Convert a list of indexes to a corresponding terminal tree node
xxxxxxxxxx
def to_node(idxes, my_str):
assert len(idxes) == idxes[-1] - idxes[0] + 1
assert min(idxes) == idxes[0]
assert max(idxes) == idxes[-1]
return my_str[idxes[0]:idxes[-1] + 1], [], idxes[0], idxes[-1]
Here is how one would use it.
Here is how one would use it.
xxxxxxxxxx
%top to_node(calc_method_tree1[6]['indexes'], calc_trace[0]['inputstr'])
('9', [], 0, 0)
xxxxxxxxxx
from operator import itemgetter
import itertools as it
We now need to identify the terminal (leaf) nodes. For that, we want to group contiguous letters in a node together, and call it a leaf node. So, convert our list of indexes to lists of contiguous indexes first, then convert them to terminal tree nodes. Then, return a set of one level child nodes with contiguous chars from indexes.
We now need to identify the terminal (leaf) nodes. For that, we want to group contiguous letters in a node together, and call it a leaf node. So, convert our list of indexes to lists of contiguous indexes first, then convert them to terminal tree nodes. Then, return a set of one level child nodes with contiguous chars from indexes.
xxxxxxxxxx
def indexes_to_children(indexes, my_str):
lst = [
list(map(itemgetter(1), g))
for k, g in it.groupby(enumerate(indexes), lambda x: x[0] - x[1])
]
return [to_node(n, my_str) for n in lst]
xxxxxxxxxx
%top indexes_to_children(calc_method_tree1[6]['indexes'], calc_trace[0]['inputstr'])
[('9', [], 0, 0)]
Finally, we need to remove the overlap from the trees we have so far. The idea is that, given a node, each child node of that node should be uniquely responsible for a specified range of characters, with no overlap allowed between the children. The starting of the first child to ending of the last child will be the range of the node.
Finally, we need to remove the overlap from the trees we have so far. The idea is that, given a node, each child node of that node should be uniquely responsible for a specified range of characters, with no overlap allowed between the children. The starting of the first child to ending of the last child will be the range of the node.
#### Removing Overlap
If overlap is found, the tie is biased to the later child. That is, the later child gets to keep the range, and the former child is recursively traversed to remove overlaps from its children. If a child is completely included in the overlap, the child is excised. A few convenience functions first:
If overlap is found, the tie is biased to the later child. That is, the later child gets to keep the range, and the former child is recursively traversed to remove overlaps from its children. If a child is completely included in the overlap, the child is excised. A few convenience functions first:
xxxxxxxxxx
def does_item_overlap(s, e, s_, e_):
return (s_ >= s and s_ <= e) or (e_ >= s and e_ <= e) or (s_ <= s and e_ >= e)
xxxxxxxxxx
def is_second_item_included(s, e, s_, e_):
return (s_ >= s and e_ <= e)
xxxxxxxxxx
def has_overlap(ranges, s_, e_):
return {(s, e) for (s, e) in ranges if does_item_overlap(s, e, s_, e_)}
xxxxxxxxxx
def is_included(ranges, s_, e_):
return {(s, e) for (s, e) in ranges if is_second_item_included(s, e, s_, e_)}
xxxxxxxxxx
def remove_overlap_from(original_node, orange):
node, children, start, end = original_node
new_children = []
if not children:
return None
start = -1
end = -1
for child in children:
if does_item_overlap(*child[2:4], *orange):
new_child = remove_overlap_from(child, orange)
if new_child: # and new_child[1]:
if start == -1: start = new_child[2]
new_children.append(new_child)
end = new_child[3]
else:
new_children.append(child)
if start == -1: start = child[2]
end = child[3]
if not new_children:
return None
assert start != -1
assert end != -1
return (node, new_children, start, end)
Verify that there is no overlap.
Verify that there is no overlap.
xxxxxxxxxx
def no_overlap(arr):
my_ranges = {}
for a in arr:
_, _, s, e = a
included = is_included(my_ranges, s, e)
if included:
continue # we will fill up the blanks later.
else:
overlaps = has_overlap(my_ranges, s, e)
if overlaps:
# unlike include which can happen only once in a set of
# non-overlapping ranges, overlaps can happen on multiple parts.
# The rule is, the later child gets the say. So, we recursively
# remove any ranges that overlap with the current one from the
# overlapped range.
assert len(overlaps) == 1
oitem = list(overlaps)[0]
v = remove_overlap_from(my_ranges[oitem], (s,e))
del my_ranges[oitem]
if v:
my_ranges[v[2:4]] = v
my_ranges[(s, e)] = a
else:
my_ranges[(s, e)] = a
res = my_ranges.values()
# assert no overlap, and order by starting index
s = sorted(res, key=lambda x: x[2])
return s
#### Generate derivation tree
Convert a mapped tree to the _fuzzingbook_ style derivation tree.
Convert a mapped tree to the fuzzingbook style derivation tree.
xxxxxxxxxx
def to_tree(node, my_str):
method_name = ("<%s>" % node['name']) if node['name'] is not None else '<START>'
indexes = node['indexes']
node_children = [to_tree(c, my_str) for c in node.get('children', [])]
idx_children = indexes_to_children(indexes, my_str)
children = no_overlap([c for c in node_children if c is not None] + idx_children)
if not children:
return None
start_idx = children[0][2]
end_idx = children[-1][3]
si = start_idx
my_children = []
# FILL IN chars that we did not compare. This is likely due to an i + n
# instruction.
for c in children:
if c[2] != si:
sbs = my_str[si: c[2]]
my_children.append((sbs, [], si, c[2] - 1))
my_children.append(c)
si = c[3] + 1
m = (method_name, my_children, start_idx, end_idx)
return m
xxxxxxxxxx
%top zoom(display_tree(to_tree(calc_method_tree1[0], calc_trace[0]['inputstr'])))
xxxxxxxxxx
%top zoom(display_tree(to_tree(mathexpr_method_tree1[0], mathexpr_trace[0]['inputstr'])))
### The Complete Miner
We now put everything together. The `miner()` takes the traces, produces trees out of them, and verifies that the trees actually correspond to the input.
We now put everything together. The miner() takes
the traces, produces trees out of them, and verifies that the trees
actually correspond to the input.
xxxxxxxxxx
from fuzzingbook.GrammarFuzzer import tree_to_string
xxxxxxxxxx
def miner(call_traces):
my_trees = []
for call_trace in call_traces:
method_map = call_trace['method_map']
first, method_tree = reconstruct_method_tree(method_map)
comparisons = call_trace['comparisons']
attach_comparisons(method_tree, last_comparisons(comparisons))
my_str = call_trace['inputstr']
#print("INPUT:", my_str, file=sys.stderr)
tree = to_tree(method_tree[first], my_str)
#print("RECONSTRUCTED INPUT:", tree_to_string(tree), file=sys.stderr)
my_tree = {'tree': tree, 'original': call_trace['original'], 'arg': call_trace['arg']}
assert tree_to_string(tree) == my_str
my_trees.append(my_tree)
return my_trees
Using the `miner()`
Using the miner()
xxxxxxxxxx
%top mined_calc_trees = miner(calc_trace)
%top calc_tree = mined_calc_trees[0]
%top zoom(display_tree(calc_tree['tree']))
xxxxxxxxxx
%top mined_mathexpr_trees = miner(mathexpr_trace)
%top mathexpr_tree = mined_mathexpr_trees[1]
%top zoom(display_tree(mathexpr_tree['tree']))
One of the problems that you can notice in the tree generated is that each `while` iterations get a different identifier. e.g.
```
('<parse_expr:while_1 ? [2]>', [('+', [], 5, 5)], 5, 5),
('<parse_expr:while_1 ? [3]>',
[('<parse_expr:if_1 + 0#[3, -1]>',
[('<parse_num>',
[('<is_digit>', [('7', [], 6, 6)], 6, 6),
('<is_digit>', [('2', [], 7, 7)], 7, 7)],
6,
7)],
6,
7)],
```
The separate identifiers are intentional because we do not yet know the actual dependencies between different iterations such as closing quotes, or closing braces or parenthesis. However, this creates a problem when we mine grammar because we need to match up the compatible nodes.
Generalizer does it through actively doing surgery on the tree to see whether a node is replaceable with another.
One of the problems that you can notice in the tree generated is
that each while iterations get a different identifier.
e.g.
('<parse_expr:while_1 ? [2]>', [('+', [], 5, 5)], 5, 5),
('<parse_expr:while_1 ? [3]>',
[('<parse_expr:if_1 + 0#[3, -1]>',
[('<parse_num>',
[('<is_digit>', [('7', [], 6, 6)], 6, 6),
('<is_digit>', [('2', [], 7, 7)], 7, 7)],
6,
7)],
6,
7)],
The separate identifiers are intentional because we do not yet know the actual dependencies between different iterations such as closing quotes, or closing braces or parenthesis. However, this creates a problem when we mine grammar because we need to match up the compatible nodes.
Generalizer does it through actively doing surgery on the tree to see whether a node is replaceable with another.
xxxxxxxxxx
import copy
import random
### Checking compatibility of nodes
We first need a few helper functions. The `replace_nodes()` function try to replace the _contents_ of the first node with the _contents_ of the second (That is, the tree that has these nodes will automatically be modified), collect the produced string from the tree, and reset any changes. The arguments are tuples with the following format: (node, file_name, tree)
We first need a few helper functions. The
replace_nodes() function try to replace the
contents of the first node with the contents of
the second (That is, the tree that has these nodes will
automatically be modified), collect the produced string from the
tree, and reset any changes. The arguments are tuples with the
following format: (node, file_name, tree)
xxxxxxxxxx
def replace_nodes(a2, a1):
node2, _, t2 = a2
node1, _, t1 = a1
str2_old = tree_to_string(t2)
old = copy.copy(node2)
node2.clear()
for n in node1:
node2.append(n)
str2_new = tree_to_string(t2)
assert str2_old != str2_new
node2.clear()
for n in old:
node2.append(n)
str2_last = tree_to_string(t2)
assert str2_old == str2_last
return str2_new
Can a given node be replaced with another? The idea is, given two nodes (possibly from two trees), can the first node be replaced by the second, and still result in a valid string?
Can a given node be replaced with another? The idea is, given two nodes (possibly from two trees), can the first node be replaced by the second, and still result in a valid string?
xxxxxxxxxx
def is_compatible(a1, a2, module):
if tree_to_string(a1[0]) == tree_to_string(a2[0]):
return True
my_string = replace_nodes(a1, a2)
return check(my_string, module)
xxxxxxxxxx
%%var check_src
# [(
import sys, imp
parse_ = imp.new_module('parse_')
def init_module(src):
with open(src) as sf:
exec(sf.read(), parse_.__dict__)
def _check(s):
try:
parse_.main(s)
return True
except:
return False
import sys
def main(args):
init_module(args[0])
if _check(args[1]):
sys.exit(0)
else:
sys.exit(1)
import sys
main(sys.argv[1:])
# )]
xxxxxxxxxx
# [(
with open('build/check.py', 'w+') as f:
print(VARS['check_src'], file=f)
# )]
xxxxxxxxxx
EXEC_MAP = {}
NODE_REGISTER = {}
TREE = None
FILE = None
xxxxxxxxxx
def reset_generalizer():
global NODE_REGISTER, TREE, FILE, EXEC_MAP
NODE_REGISTER={}
TREE = None
FILE = None
EXEC_MAP = {}
xxxxxxxxxx
reset_generalizer()
xxxxxxxxxx
import os.path
xxxxxxxxxx
def check(s, module):
if s in EXEC_MAP: return EXEC_MAP[s]
result = do(["python", "./build/check.py", "subjects/%s" % module, s], shell=False)
with open('build/%s.log' % module, 'a+') as f:
print(s, file=f)
print(' '.join(["python", "./build/check.py", "subjects/%s" % module, s]), file=f)
print(":=", result.returncode, file=f)
print("\n", file=f)
v = (result.returncode == 0)
EXEC_MAP[s] = v
return v
xxxxxxxxxx
def to_modifiable(derivation_tree):
node, children, *rest = derivation_tree
return [node, [to_modifiable(c) for c in children], *rest]
xxxxxxxxxx
%top calc_tree_ = to_modifiable(calc_tree['tree'])
%top while_loops = calc_tree_[1][0][1][0][1]
xxxxxxxxxx
%top while_loops[0]
['<parse_expr:while_1 ? [1]>', [['<parse_expr:if_1 = 0#[1, -1]>', [['<parse_num>', [['<is_digit>', [['9', [], 0, 0]], 0, 0]], 0, 0]], 0, 0]], 0, 0]
xxxxxxxxxx
%top while_loops[1]
['<parse_expr:while_1 ? [2]>', [['-', [], 1, 1]], 1, 1]
xxxxxxxxxx
%top assert not is_compatible((while_loops[1], 'c.py', calc_tree_), (while_loops[0], 'c.py', calc_tree_), 'calculator.py')
xxxxxxxxxx
%top assert is_compatible((while_loops[0], 'c.py', calc_tree_), (while_loops[2], 'c.py', calc_tree_), 'calculator.py')
We need to extract meta information from the names, and update it back. TODO: make the meta info JSON.
We need to extract meta information from the names, and update it back. TODO: make the meta info JSON.
xxxxxxxxxx
def parse_name(name):
assert name[0] + name[-1] == '<>'
name = name[1:-1]
method, rest = name.split(':')
ctrl_name, space, rest = rest.partition(' ')
can_empty, space, stack = rest.partition(' ')
ctrl, cname = ctrl_name.split('_')
if ':while_' in name:
method_stack = json.loads(stack)
return method, ctrl, int(cname), 0, can_empty, method_stack
elif ':if_' in name:
num, mstack = stack.split('#')
method_stack = json.loads(mstack)
return method, ctrl, int(cname), num, can_empty, method_stack
xxxxxxxxxx
%top [parse_name(w[0]) for w in while_loops]
[('parse_expr', 'while', 1, 0, '?', [1]),
('parse_expr', 'while', 1, 0, '?', [2]),
('parse_expr', 'while', 1, 0, '?', [3]),
('parse_expr', 'while', 1, 0, '?', [4]),
('parse_expr', 'while', 1, 0, '?', [5]),
('parse_expr', 'while', 1, 0, '?', [6]),
('parse_expr', 'while', 1, 0, '?', [7])]
xxxxxxxxxx
def unparse_name(method, ctrl, name, num, can_empty, cstack):
if ctrl == 'while':
return "<%s:%s_%s %s %s>" % (method, ctrl, name, can_empty, json.dumps(cstack))
else:
return "<%s:%s_%s %s %s#%s>" % (method, ctrl, name, can_empty, num, json.dumps(cstack))
Verify that parsing and unparsing works.
Verify that parsing and unparsing works.
xxxxxxxxxx
%top assert all(unparse_name(*parse_name(w[0])) == w[0] for w in while_loops)
### Propagate rename of the `while` node up the child nodes.
The `update_stack()` when given a node, and a new name, recursively updates the method stack in the children.
while node up the child nodes.¶The update_stack() when given a node, and a new
name, recursively updates the method stack in the children.
xxxxxxxxxx
def update_stack(node, at, new_name):
nname, children, *rest = node
if not (':if_' in nname or ':while_' in nname):
return
method, ctrl, cname, num, can_empty, cstack = parse_name(nname)
cstack[at] = new_name
name = unparse_name(method, ctrl, cname, num, can_empty, cstack)
#assert '?' not in name
node[0] = name
for c in children:
update_stack(c, at, new_name)
Update the node name once we have identified that it corresponds to a global name.
Update the node name once we have identified that it corresponds to a global name.
xxxxxxxxxx
def update_name(k_m, my_id, seen):
# fixup k_m with what is in my_id, and update seen.
original = k_m[0]
method, ctrl, cname, num, can_empty, cstack = parse_name(original)
#assert can_empty != '?'
cstack[-1] = float('%d.0' % my_id)
name = unparse_name(method, ctrl, cname, num, can_empty, cstack)
seen[k_m[0]] = name
k_m[0] = name
# only replace it at the len(cstack) -1 the
# until the first non-cf token
children = []
for c in k_m[1]:
update_stack(c, len(cstack)-1, cstack[-1])
return name, k_m
Note that the rename happens only within the current method stack. That is, it does not propagate across method calls. Here is how one would use it.
Note that the rename happens only within the current method stack. That is, it does not propagate across method calls. Here is how one would use it.
xxxxxxxxxx
%top while_loops[2]
['<parse_expr:while_1 ? [3]>',
[['<parse_expr:if_1 = 2#[3, -1]>',
[['<parse_paren>',
[['(', [], 2, 2],
['<parse_expr>',
[['<parse_expr:while_1 ? [1]>',
[['<parse_expr:if_1 = 0#[1, -1]>',
[['<parse_num>',
[['<is_digit>', [['1', [], 3, 3]], 3, 3],
['<is_digit>', [['6', [], 4, 4]], 4, 4]],
3,
4]],
3,
4]],
3,
4],
['<parse_expr:while_1 ? [2]>', [['+', [], 5, 5]], 5, 5],
['<parse_expr:while_1 ? [3]>',
[['<parse_expr:if_1 = 0#[3, -1]>',
[['<parse_num>',
[['<is_digit>', [['7', [], 6, 6]], 6, 6],
['<is_digit>', [['2', [], 7, 7]], 7, 7]],
6,
7]],
6,
7]],
6,
7]],
3,
7],
[')', [], 8, 8]],
2,
8]],
2,
8]],
2,
8]
We update the iteration number `3` with a global id `4.0`
We update the iteration number 3 with a global id
4.0
xxxxxxxxxx
%top name, node = update_name(while_loops[2], 4, {})
%top node
['<parse_expr:while_1 ? [4.0]>',
[['<parse_expr:if_1 = 2#[4.0, -1]>',
[['<parse_paren>',
[['(', [], 2, 2],
['<parse_expr>',
[['<parse_expr:while_1 ? [1]>',
[['<parse_expr:if_1 = 0#[1, -1]>',
[['<parse_num>',
[['<is_digit>', [['1', [], 3, 3]], 3, 3],
['<is_digit>', [['6', [], 4, 4]], 4, 4]],
3,
4]],
3,
4]],
3,
4],
['<parse_expr:while_1 ? [2]>', [['+', [], 5, 5]], 5, 5],
['<parse_expr:while_1 ? [3]>',
[['<parse_expr:if_1 = 0#[3, -1]>',
[['<parse_num>',
[['<is_digit>', [['7', [], 6, 6]], 6, 6],
['<is_digit>', [['2', [], 7, 7]], 7, 7]],
6,
7]],
6,
7]],
6,
7]],
3,
7],
[')', [], 8, 8]],
2,
8]],
2,
8]],
2,
8]
##### replace a set of nodes
We want to replace the `while` loop iteration identifiers with a global identifier. For that, we are given a list of nodes that are compatible with global ones. We first extract the iteration id from the global node, and apply it on the `while` node under consideration.
We want to replace the while loop iteration
identifiers with a global identifier. For that, we are given a list
of nodes that are compatible with global ones. We first extract the
iteration id from the global node, and apply it on the
while node under consideration.
xxxxxxxxxx
def replace_stack_and_mark_star(to_replace):
# remember, we only replace whiles.
for (i, j) in to_replace:
method1, ctrl1, cname1, num1, can_empty1, cstack1 = parse_name(i[0])
method2, ctrl2, cname2, num2, can_empty2, cstack2 = parse_name(j[0])
assert method1 == method2
assert ctrl1 == ctrl2
assert cname1 == cname2
#assert can_empty2 != '?'
# fixup the can_empty
new_name = unparse_name(method1, ctrl1, cname1, num1, can_empty2, cstack1)
i[0] = new_name
assert len(cstack1) == len(cstack2)
update_stack(i, len(cstack2)-1, cstack2[-1])
to_replace.clear()
### Generalize a given set of loops
The main workhorse. It generalizes the looping constructs. It is given a set of while loops with the same label under the current node. TODO: Refactor when we actually have time.
The main workhorse. It generalizes the looping constructs. It is given a set of while loops with the same label under the current node. TODO: Refactor when we actually have time.
##### Helper: node inclusion
Checking for node inclusion. We do not want to try including a first node in second if the first node already contains the second. It will lead to infinite loop on `tree_to_string()`.
Checking for node inclusion. We do not want to try including a
first node in second if the first node already contains the second.
It will lead to infinite loop on tree_to_string().
xxxxxxxxxx
def node_include(i, j):
name_i, children_i, s_i, e_i = i
name_j, children_j, s_j, e_j = j
return s_i <= s_j and e_i >= e_j
##### Helper: sorting
Ordering nodes by their highest complexity to avoid spurious can-replace answers.
Ordering nodes by their highest complexity to avoid spurious can-replace answers.
xxxxxxxxxx
def num_tokens(v, s):
name, child, *rest = v
s.add(name)
[num_tokens(i, s) for i in child]
return len(s)
def s_fn(v):
return num_tokens(v[0], set())
xxxxxxxxxx
MAX_SAMPLES = 1 # with reasonably complex inputs, this is sufficent if we do the surgery both ways.
First, we check whether any of the loops we have are compatible with the globally registered loops in `while_register`.
First, we check whether any of the loops we have are compatible
with the globally registered loops in
while_register.
xxxxxxxxxx
def check_registered_loops_for_compatibility(idx_map, while_register, module):
seen = {}
to_replace = []
idx_keys = sorted(idx_map.keys())
for while_key, f in while_register[0]:
# try sampling here.
my_values = while_register[0][(while_key, f)]
v_ = random.choice(my_values)
for k in idx_keys:
k_m = idx_map[k]
if k_m[0] in seen: continue
if len(my_values) > MAX_SAMPLES:
lst = [v for v in my_values if not node_include(v[0], k_m)]
values = sorted(lst, key=s_fn, reverse=True)[0:MAX_SAMPLES]
else:
values = my_values
# all values in v should be tried.
replace = 0
for v in values:
assert v[0][0] == v_[0][0]
if f != FILE or not node_include(v[0], k_m): # if not k_m includes v
a = is_compatible((k_m, FILE, TREE), v, module)
if not a:
replace = 0
break
else:
replace += 1
if f != FILE or not node_include(k_m, v[0]):
b = is_compatible(v, (k_m, FILE, TREE), module)
if not b:
replace = 0
break
else:
replace += 1
# at least one needs to vouch, and all capable needs to agree.
if replace:
to_replace.append((k_m, v_[0])) # <- replace k_m by v
seen[k_m[0]] = True
replace_stack_and_mark_star(to_replace)
Next, for all the loops that remain, check if they can be deleted. If they can be, we want to place `Epsilon == *` in place of `?` in the `can_empty` position.
Next, for all the loops that remain, check if they can be
deleted. If they can be, we want to place Epsilon == *
in place of ? in the can_empty
position.
xxxxxxxxxx
def can_the_loop_be_deleted(idx_map, while_register, module):
idx_keys = sorted(idx_map.keys())
for i in idx_keys:
i_m = idx_map[i]
if '.0' in i_m[0]:
# assert '?' not in i_m[0]
continue
a = is_compatible((i_m, FILE, TREE), (['', [], 0, 0], FILE, TREE), module)
method1, ctrl1, cname1, num1, can_empty, cstack1 = parse_name(i_m[0])
name = unparse_name(method1, ctrl1, cname1, num1, Epsilon if a else NoEpsilon, cstack1)
i_m[0] = name
Next, we check all current loops whether they are compatible with each other. Essentially, we start from the back, and check if the first or second or third ... nodes are compatible with the last node. Then take the second last node and do the same.
If they are, we use the same name for all compatible nodes.
Next, we check all current loops whether they are compatible with each other. Essentially, we start from the back, and check if the first or second or third ... nodes are compatible with the last node. Then take the second last node and do the same.
If they are, we use the same name for all compatible nodes.
xxxxxxxxxx
def check_current_loops_for_compatibility(idx_map, while_register, module):
to_replace = []
rkeys = sorted(idx_map.keys(), reverse=True)
for i in rkeys: # <- nodes to check for replacement -- started from the back
i_m = idx_map[i]
# assert '?' not in i_m[0]
if '.0' in i_m[0]: continue
j_keys = sorted([j for j in idx_map.keys() if j < i])
for j in j_keys: # <- nodes that we can replace i_m with -- starting from front.
j_m = idx_map[j]
# assert '?' not in j_m[0]
if i_m[0] == j_m[0]: break
# previous whiles worked.
replace = False
if not node_include(j_m, i_m):
a = is_compatible((i_m, FILE, TREE), (j_m, FILE, TREE), module)
if not a: continue
replace = True
if not node_include(i_m, j_m):
b = is_compatible((j_m, FILE, TREE), (i_m, FILE, TREE), module)
if not b: continue
replace = True
if replace:
to_replace.append((i_m, j_m)) # <- replace i_m by j_m
break
replace_stack_and_mark_star(to_replace)
Finally, register all the new while loops discovered.
Finally, register all the new while loops discovered.
xxxxxxxxxx
def register_new_loops(idx_map, while_register):
idx_keys = sorted(idx_map.keys())
seen = {}
for k in idx_keys:
k_m = idx_map[k]
if ".0" not in k_m[0]:
if k_m[0] in seen:
k_m[0] = seen[k_m[0]]
# and update
method1, ctrl1, cname1, num1, can_empty1, cstack1 = parse_name(k_m[0])
update_name(k_m, cstack1[-1], seen)
continue
# new! get a brand new name!
while_register[1] += 1
my_id = while_register[1]
original_name = k_m[0]
#assert '?' not in original_name
name, new_km = update_name(k_m, my_id, seen)
#assert '?' not in name
while_register[0][(name, FILE)] = [(new_km, FILE, TREE)]
else:
name = k_m[0]
if (name, FILE) not in while_register[0]:
while_register[0][(name, FILE)] = []
while_register[0][(name, FILE)].append((k_m, FILE, TREE))
All together.
All together.
xxxxxxxxxx
def generalize_loop(idx_map, while_register, module):
# First we check the previous while loops
check_registered_loops_for_compatibility(idx_map, while_register, module)
# Check whether any of these can be deleted.
can_the_loop_be_deleted(idx_map, while_register, module)
# then we check he current while iterations
check_current_loops_for_compatibility(idx_map, while_register, module)
# lastly, update all while names.
register_new_loops(idx_map, while_register)
We keep a global registry of nodes, so that we can use the same iteration labels.
We keep a global registry of nodes, so that we can use the same iteration labels.
xxxxxxxxxx
# NODE_REGISTER = {}
### Collect loops to generalize
The idea is to look through the tree, looking for while loops.
When one sees a while loop, start at one end, and see if the
while iteration index can be replaced by the first one, and vice
versa. If not, try with the second one and so on until the first one
succeeds. When one succeeds, replace the definition of the matching
one with an alternate with the last's definition, and replace the
name of last with the first, and delete last. Here, we only collect the while loops with same labels, with `generalize_loop()` doing the rest.
The idea is to look through the tree, looking for while loops.
When one sees a while loop, start at one end, and see if the while
iteration index can be replaced by the first one, and vice versa.
If not, try with the second one and so on until the first one
succeeds. When one succeeds, replace the definition of the matching
one with an alternate with the last's definition, and replace the
name of last with the first, and delete last. Here, we only collect
the while loops with same labels, with
generalize_loop() doing the rest.
xxxxxxxxxx
def generalize(tree, module):
node, children, *_rest = tree
if node not in NODE_REGISTER:
NODE_REGISTER[node] = {}
register = NODE_REGISTER[node]
for child in children:
generalize(child, module)
idxs = {}
last_while = None
for i,child in enumerate(children):
# now we need to map the while_name here to the ones in node
# register. Essentially, we try to replace each.
if ':while_' not in child[0]:
continue
while_name = child[0].split(' ')[0]
if last_while is None:
last_while = while_name
if while_name not in register:
register[while_name] = [{}, 0]
else:
if last_while != while_name:
# a new while! Generalize the last
last_while = while_name
generalize_loop(idxs, register[last_while])
idxs[i] = child
if last_while is not None:
generalize_loop(idxs, register[last_while], module)
We need the ability for fairly deep surgery. So we dump and load the mined trees to convert tuples to arrays.
We need the ability for fairly deep surgery. So we dump and load the mined trees to convert tuples to arrays.
xxxxxxxxxx
def generalize_iter(jtrees, log=False):
global TREE, FILE
new_trees = []
for j in jtrees:
FILE = j['arg']
if log: print(FILE, file=sys.stderr)
sys.stderr.flush()
TREE = to_modifiable(j['tree'])
generalize(TREE, j['original'])
j['tree'] = TREE
new_trees.append(copy.deepcopy(j))
return new_trees
xxxxxxxxxx
from fuzzingbook.GrammarFuzzer import extract_node as extract_node_o
xxxxxxxxxx
%top reset_generalizer()
%top generalized_calc_trees = generalize_iter(mined_calc_trees)
%top zoom(display_tree(generalized_calc_trees[0]['tree'], extract_node=extract_node_o))
xxxxxxxxxx
%top reset_generalizer()
%top generalized_mathexpr_trees = generalize_iter(mined_mathexpr_trees)
%top zoom(display_tree(generalized_mathexpr_trees[1]['tree'], extract_node=extract_node_o))
## Generating a Grammar
Generating a grammar from the generalized derivation trees is pretty simple. Start at the start node, and any node that represents a method or a pseudo method becomes a nonterminal. The children forms alternate expansions for the nonterminal. Since all the keys are compatible, merging the grammar is simply merging the hash map.
Generating a grammar from the generalized derivation trees is pretty simple. Start at the start node, and any node that represents a method or a pseudo method becomes a nonterminal. The children forms alternate expansions for the nonterminal. Since all the keys are compatible, merging the grammar is simply merging the hash map.
First, we define a pretty printer for grammar.
First, we define a pretty printer for grammar.
xxxxxxxxxx
import re
RE_NONTERMINAL = re.compile(r'(<[^<> ]*>)')
xxxxxxxxxx
def recurse_grammar(grammar, key, order, canonical):
rules = sorted(grammar[key])
old_len = len(order)
for rule in rules:
if not canonical:
res = re.findall(RE_NONTERMINAL, rule)
else:
res = rule
for token in res:
if token.startswith('<') and token.endswith('>'):
if token not in order:
order.append(token)
new = order[old_len:]
for ckey in new:
recurse_grammar(grammar, ckey, order, canonical)
xxxxxxxxxx
def show_grammar(grammar, start_symbol='<START>', canonical=True):
order = [start_symbol]
recurse_grammar(grammar, start_symbol, order, canonical)
assert len(order) == len(grammar.keys())
return {k: sorted(grammar[k]) for k in order}
xxxxxxxxxx
def to_grammar(tree, grammar):
node, children, _, _ = tree
tokens = []
if node not in grammar:
grammar[node] = list()
for c in children:
if c[1] == []:
tokens.append(c[0])
else:
tokens.append(c[0])
to_grammar(c, grammar)
grammar[node].append(tuple(tokens))
return grammar
xxxxxxxxxx
def merge_grammar(g1, g2):
all_keys = set(list(g1.keys()) + list(g2.keys()))
merged = {}
for k in all_keys:
alts = set(g1.get(k, []) + g2.get(k, []))
merged[k] = alts
return {k:[l for l in merged[k]] for k in merged}
xxxxxxxxxx
def convert_to_grammar(my_trees):
grammar = {}
for my_tree in my_trees:
tree = my_tree['tree']
src = my_tree['original']
g = to_grammar(tree, grammar)
grammar = merge_grammar(grammar, g)
return grammar
xxxxxxxxxx
%top calc_grammar = convert_to_grammar(generalized_calc_trees)
%top show_grammar(calc_grammar)
{'<START>': [('<main>',)],
'<main>': [('<parse_expr>',)],
'<parse_expr>': [('<parse_expr:while_1 = [1.0]>',),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>'),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>'),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>')],
'<parse_expr:while_1 = [1.0]>': [('<parse_expr:if_1 = 0#[1.0, -1]>',),
('<parse_expr:if_1 = 2#[1.0, -1]>',)],
'<parse_expr:while_1 - [2.0]>': [('*',), ('+',), ('-',), ('/',)],
'<parse_expr:if_1 = 0#[1.0, -1]>': [('<parse_num>',)],
'<parse_expr:if_1 = 2#[1.0, -1]>': [('<parse_paren>',)],
'<parse_num>': [('<is_digit>',),
('<is_digit>', '<is_digit>'),
('<is_digit>', '<is_digit>', '<is_digit>')],
'<is_digit>': [('0',),
('1',),
('2',),
('3',),
('4',),
('5',),
('6',),
('7',),
('8',),
('9',)],
'<parse_paren>': [('(', '<parse_expr>', ')')]}
xxxxxxxxxx
%top mathexpr_grammar = convert_to_grammar(generalized_mathexpr_trees)
%top show_grammar(mathexpr_grammar)
{'<START>': [('<main>',)],
'<main>': [('<getValue>',)],
'<getValue>': [('<parseExpression>',)],
'<parseExpression>': [('<parseAddition>',)],
'<parseAddition>': [('<parseMultiplication>',),
('<parseMultiplication>', '<parseAddition:while_1 - [1.0]>')],
'<parseMultiplication>': [('<parseParenthesis>',),
('<parseParenthesis>', '<parseMultiplication:while_1 - [1.0]>')],
'<parseAddition:while_1 - [1.0]>': [('+',
'<parseAddition:if_1 = 0#[1.0, -1]>')],
'<parseParenthesis>': [('<parseParenthesis:if_1 = 1#[-1]>',),
('<skipWhitespace>', '<parseParenthesis:if_1 = 1#[-1]>')],
'<parseMultiplication:while_1 - [1.0]>': [('<skipWhitespace>',),
('<skipWhitespace>', '*', '<parseMultiplication:if_1 = 0#[1.0, -1]>')],
'<parseParenthesis:if_1 = 1#[-1]>': [('<parseNegative>',)],
'<skipWhitespace>': [('<skipWhitespace:while_1 - [1.0]>',)],
'<parseNegative>': [('<parseNegative:if_1 = 1#[-1]>',)],
'<parseNegative:if_1 = 1#[-1]>': [('<parseValue>',)],
'<parseValue>': [('<parseValue:if_1 = 0#[-1]>',)],
'<parseValue:if_1 = 0#[-1]>': [('<parseNumber>',)],
'<parseNumber>': [('<parseNumber:while_1 - [1.0]>',),
('<parseNumber:while_1 - [1.0]>',
'<parseNumber:while_1 - [1.0]>',
'<parseNumber:while_1 - [1.0]>')],
'<parseNumber:while_1 - [1.0]>': [('0',),
('1',),
('2',),
('3',),
('4',),
('5',)],
'<skipWhitespace:while_1 - [1.0]>': [(' ',)],
'<parseMultiplication:if_1 = 0#[1.0, -1]>': [('<parseParenthesis>',)],
'<parseAddition:if_1 = 0#[1.0, -1]>': [('<parseMultiplication>',)]}
The grammar generated may still contain meta characters such as `<` and `>`. We need to cleanup these to make it a grammar that is fuzzable using the Fuzzingbook fuzzers.
The grammar generated may still contain meta characters such as
< and >. We need to cleanup these
to make it a grammar that is fuzzable using the Fuzzingbook
fuzzers.
xxxxxxxxxx
def to_fuzzable_grammar(grammar):
def escape(t):
if ((t[0]+t[-1]) == '<>'):
return t.replace(' ', '_')
else:
return t
new_g = {}
for k in grammar:
new_alt = []
for rule in grammar[k]:
new_alt.append(''.join([escape(t) for t in rule]))
new_g[k.replace(' ', '_')] = new_alt
return new_g
xxxxxxxxxx
from fuzzingbook import GrammarFuzzer
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(calc_grammar), start_symbol='<START>')
for i in range(10):
print(gf.fuzz())
# )]
(6+9-8+0)-((4)-(5))/(4+0) 5/011 204/(9/4/5*0) 10+(((2/4-8))/584*41)/((((3*9*8)*07))) (2/(0))/(8-4+2)*(4)*(9/0) 434/48/9-585 546-6 3/315*((2)*(3)-5)/(2/6+1/0) (((1/0)*27-(3/8-9+5)))/(6) 0*9*(1+2-7-0)
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(mathexpr_grammar), start_symbol='<START>')
for i in range(10):
print(gf.fuzz())
# )]
2 *145 1 * 2+ 401 5 254 301+032 004 +5 4 * 2+011 1+420 224 2
### Inserting Empty Alternatives for IF and Loops
Next, we want to insert empty rules for those loops and conditionals that can be skipped. For loops, the entire sequence has to contain the empty marker.
Next, we want to insert empty rules for those loops and conditionals that can be skipped. For loops, the entire sequence has to contain the empty marker.
xxxxxxxxxx
def check_empty_rules(grammar):
new_grammar = {}
for k in grammar:
if k in ':if_':
name, marker = k.split('#')
if name.endswith(' *'):
new_grammar[k] = grammar[k].add(('',))
else:
new_grammar[k] = grammar[k]
elif k in ':while_':
# TODO -- we have to check the rules for sequences of whiles.
# for now, ignore.
new_grammar[k] = grammar[k]
else:
new_grammar[k] = grammar[k]
return new_grammar
xxxxxxxxxx
%top ne_calc_grammar = check_empty_rules(calc_grammar)
%top show_grammar(ne_calc_grammar)
{'<START>': [('<main>',)],
'<main>': [('<parse_expr>',)],
'<parse_expr>': [('<parse_expr:while_1 = [1.0]>',),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>'),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>'),
('<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr:while_1 = [1.0]>')],
'<parse_expr:while_1 = [1.0]>': [('<parse_expr:if_1 = 0#[1.0, -1]>',),
('<parse_expr:if_1 = 2#[1.0, -1]>',)],
'<parse_expr:while_1 - [2.0]>': [('*',), ('+',), ('-',), ('/',)],
'<parse_expr:if_1 = 0#[1.0, -1]>': [('<parse_num>',)],
'<parse_expr:if_1 = 2#[1.0, -1]>': [('<parse_paren>',)],
'<parse_num>': [('<is_digit>',),
('<is_digit>', '<is_digit>'),
('<is_digit>', '<is_digit>', '<is_digit>')],
'<is_digit>': [('0',),
('1',),
('2',),
('3',),
('4',),
('5',),
('6',),
('7',),
('8',),
('9',)],
'<parse_paren>': [('(', '<parse_expr>', ')')]}
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_calc_grammar), start_symbol='<START>')
for i in range(10):
print(repr(gf.fuzz()))
# )]
'20' '(0*7*2+5)*(2/1)' '((56+1+53)*((905)))' '59' '(7*(((((8)/((1+3+3)+(6-3*1))))*9*710)))' '28/3*75*40' '0/((2)/(8+4)/11)/((95/((4*3+6*9))))+6' '96-9' '(649-(5/1+((((212-76))/608)/4)*66))-18' '((2))'
xxxxxxxxxx
%top ne_mathexpr_grammar = check_empty_rules(mathexpr_grammar)
%top show_grammar(ne_mathexpr_grammar)
{'<START>': [('<main>',)],
'<main>': [('<getValue>',)],
'<getValue>': [('<parseExpression>',)],
'<parseExpression>': [('<parseAddition>',)],
'<parseAddition>': [('<parseMultiplication>',),
('<parseMultiplication>', '<parseAddition:while_1 - [1.0]>')],
'<parseMultiplication>': [('<parseParenthesis>',),
('<parseParenthesis>', '<parseMultiplication:while_1 - [1.0]>')],
'<parseAddition:while_1 - [1.0]>': [('+',
'<parseAddition:if_1 = 0#[1.0, -1]>')],
'<parseParenthesis>': [('<parseParenthesis:if_1 = 1#[-1]>',),
('<skipWhitespace>', '<parseParenthesis:if_1 = 1#[-1]>')],
'<parseMultiplication:while_1 - [1.0]>': [('<skipWhitespace>',),
('<skipWhitespace>', '*', '<parseMultiplication:if_1 = 0#[1.0, -1]>')],
'<parseParenthesis:if_1 = 1#[-1]>': [('<parseNegative>',)],
'<skipWhitespace>': [('<skipWhitespace:while_1 - [1.0]>',)],
'<parseNegative>': [('<parseNegative:if_1 = 1#[-1]>',)],
'<parseNegative:if_1 = 1#[-1]>': [('<parseValue>',)],
'<parseValue>': [('<parseValue:if_1 = 0#[-1]>',)],
'<parseValue:if_1 = 0#[-1]>': [('<parseNumber>',)],
'<parseNumber>': [('<parseNumber:while_1 - [1.0]>',),
('<parseNumber:while_1 - [1.0]>',
'<parseNumber:while_1 - [1.0]>',
'<parseNumber:while_1 - [1.0]>')],
'<parseNumber:while_1 - [1.0]>': [('0',),
('1',),
('2',),
('3',),
('4',),
('5',)],
'<skipWhitespace:while_1 - [1.0]>': [(' ',)],
'<parseMultiplication:if_1 = 0#[1.0, -1]>': [('<parseParenthesis>',)],
'<parseAddition:if_1 = 0#[1.0, -1]>': [('<parseMultiplication>',)]}
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_mathexpr_grammar), start_symbol='<START>')
for i in range(10):
print(repr(gf.fuzz()))
# )]
'4 * 1' ' 4+ 1' ' 051 *155+ 3' ' 303 +512 ' '2+ 2 * 1' '124 + 5' '444+ 221' '4+5 ' '3 *1' '0+233'
We now need to generalize the loops. The idea is to look for patterns exclusively in the similarly named while loops using any of the regular expression learners. For the prototype, we replaced the modified Sequitur with the modified Fernau which gave us better regular expressions than before. The main constraint we have is that we want to avoid repeated execution of program if possible. Fernau algorithm can recover a reasonably approximate regular exression based only on positive data.
We now need to generalize the loops. The idea is to look for patterns exclusively in the similarly named while loops using any of the regular expression learners. For the prototype, we replaced the modified Sequitur with the modified Fernau which gave us better regular expressions than before. The main constraint we have is that we want to avoid repeated execution of program if possible. Fernau algorithm can recover a reasonably approximate regular exression based only on positive data.
#### The modified Fernau algorithm
The Fernau algorithm is from _Algorithms for learning regular expressions from positive data_ by _HenningFernau_. Our algorithm uses a modified form of the Prefix-Tree-Acceptor from Fernau. First we define an LRF buffer of a given size.
The Fernau algorithm is from Algorithms for learning regular expressions from positive data by HenningFernau. Our algorithm uses a modified form of the Prefix-Tree-Acceptor from Fernau. First we define an LRF buffer of a given size.
xxxxxxxxxx
import json
class Buf:
def __init__(self, size):
self.size = size
self.items = [None] * self.size
The `add1()` takes in an array, and transfers the first element of the array into the end of current buffer, and simultaneously drops the first element of the buffer.
The add1() takes in an array, and transfers the
first element of the array into the end of current buffer, and
simultaneously drops the first element of the buffer.
xxxxxxxxxx
class Buf(Buf):
def add1(self, items):
self.items.append(items.pop(0))
return self.items.pop(0)
For equality between the buffer and an array, we only compare when both the array and the items are actually elements and not chunked arrays.
For equality between the buffer and an array, we only compare when both the array and the items are actually elements and not chunked arrays.
xxxxxxxxxx
class Buf(Buf):
def __eq__(self, items):
if any(isinstance(i, dict) for i in self.items): return False
if any(isinstance(i, dict) for i in items): return False
return items == self.items
The `detect_chunks()` detects any repeating portions of a list of `n` size.
The detect_chunks() detects any repeating portions
of a list of n size.
xxxxxxxxxx
def detect_chunks(n, lst_):
lst = list(lst_)
chunks = set()
last = Buf(n)
# check if the next_n elements are repeated.
for _ in range(len(lst) - n):
lnext_n = lst[0:n]
if last == lnext_n:
# found a repetition.
chunks.add(tuple(last.items))
else:
pass
last.add1(lst)
return chunks
Once we have detected plausible repeating sequences, we gather all similar sequences into arrays.
Once we have detected plausible repeating sequences, we gather all similar sequences into arrays.
xxxxxxxxxx
def chunkify(lst_,n , chunks):
lst = list(lst_)
chunked_lst = []
while len(lst) >= n:
lnext_n = lst[0:n]
if (not any(isinstance(i, dict) for i in lnext_n)) and tuple(lnext_n) in chunks:
chunked_lst.append({'_':lnext_n})
lst = lst[n:]
else:
chunked_lst.append(lst.pop(0))
chunked_lst.extend(lst)
return chunked_lst
The `identify_chunks()` simply calls the `detect_chunks()` on all given lists, and then converts all chunks identified into arrays.
The identify_chunks() simply calls the
detect_chunks() on all given lists, and then converts
all chunks identified into arrays.
xxxxxxxxxx
def identify_chunks(my_lsts):
# initialize
all_chunks = {}
maximum = max(len(lst) for lst in my_lsts)
for i in range(1, maximum//2+1):
all_chunks[i] = set()
# First, identify chunks in each list.
for lst in my_lsts:
for i in range(1,maximum//2+1):
chunks = detect_chunks(i, lst)
all_chunks[i] |= chunks
# Then, chunkify
new_lsts = []
for lst in my_lsts:
for i in range(1,maximum//2+1):
chunks = all_chunks[i]
lst = chunkify(lst, i, chunks)
new_lsts.append(lst)
return new_lsts
##### Prefix tree acceptor
The prefix tree acceptor is a way to represent positive data. The `Node` class holds a single node in the prefix tree acceptor.
The prefix tree acceptor is a way to represent positive data.
The Node class holds a single node in the prefix tree
acceptor.
xxxxxxxxxx
class Node:
# Each tree node gets its unique id.
_uid = 0
def __init__(self, item):
# self.repeats = False
self.count = 1 # how many repetitions.
self.counters = set()
self.last = False
self.children = []
self.item = item
self.uid = Node._uid
Node._uid += 1
def update_counters(self):
self.counters.add(self.count)
self.count = 0
for c in self.children:
c.update_counters()
def __repr__(self):
return str(self.to_json())
def __str__(self):
return str("(%s, [%s])", (self.item, ' '.join([str(i) for i in self.children])))
def to_json(self):
s = ("(%s)" % ' '.join(self.item['_'])) if isinstance(self.item, dict) else str(self.item)
return (s, tuple(self.counters), [i.to_json() for i in self.children])
def inc_count(self):
self.count += 1
def add_ref(self):
self.count = 1
def get_child(self, c):
for i in self.children:
if i.item == c: return i
return None
def add_child(self, c):
# first check if it is the current node. If it is, increment
# count, and return ourselves.
if c == self.item:
self.inc_count()
return self
else:
# check if it is one of the children. If it is a child, then
# preserve its original count.
nc = self.get_child(c)
if nc is None:
nc = Node(c)
self.children.append(nc)
else:
nc.add_ref()
return nc
xxxxxxxxxx
def update_tree(lst_, root):
lst = list(lst_)
branch = root
while lst:
first, *lst = lst
branch = branch.add_child(first)
branch.last = True
return root
def create_tree_with_lsts(lsts):
Node._uid = 0
root = Node(None)
for lst in lsts:
root.count = 1 # there is at least one element.
update_tree(lst, root)
root.update_counters()
return root
def get_star(node, key):
if node.item is None:
return ''
if isinstance(node.item, dict):
# take care of counters
elements = node.item['_']
my_key = "<%s-%d-s>" % (key, node.uid)
alts = [elements]
if len(node.counters) > 1: # repetition
alts.append(elements + [my_key])
return [my_key], {my_key:alts}
else:
return [str(node.item)], {}
def node_to_grammar(node, grammar, key):
rule = []
alts = [rule]
if node.uid == 0:
my_key = "<%s>" % key
else:
my_key = "<%s-%d>" % (key, node.uid)
grammar[my_key] = alts
if node.item is not None:
mk, g = get_star(node, key)
rule.extend(mk)
grammar.update(g)
# is the node last?
if node.last:
assert node.item is not None
# add a duplicate rule that ends here.
ending_rule = list(rule)
# if there are no children, the current rule is
# any way ending.
if node.children:
alts.append(ending_rule)
if node.children:
if len(node.children) > 1:
my_ckey = "<%s-%d-c>" % (key, node.uid)
rule.append(my_ckey)
grammar[my_ckey] = [ ["<%s-%d>" % (key, c.uid)] for c in node.children]
else:
my_ckey = "<%s-%d>" % (key, node.children[0].uid)
rule.append(my_ckey)
else:
pass
for c in node.children:
node_to_grammar(c, grammar, key)
return grammar
def generate_grammar(lists, key):
lsts = identify_chunks(lists)
tree = create_tree_with_lsts(lsts)
grammar = {}
node_to_grammar(tree, grammar, key)
return grammar
Given a rule, determine the abstraction for it.
Given a rule, determine the abstraction for it.
xxxxxxxxxx
def collapse_alts(rules, k):
ss = [[str(r) for r in rule] for rule in rules]
x = generate_grammar(ss, k[1:-1])
return x
xxxxxxxxxx
def collapse_rules(grammar):
r_grammar = {}
for k in grammar:
new_grammar = collapse_alts(grammar[k], k)
# merge the new_grammar with r_grammar
# we know none of the keys exist in r_grammar because
# new keys are k prefixed.
for k_ in new_grammar:
r_grammar[k_] = new_grammar[k_]
return r_grammar
xxxxxxxxxx
%top collapsed_calc_grammar = collapse_rules(ne_calc_grammar)
%top show_grammar(collapsed_calc_grammar)
{'<START>': [['<START-1>']],
'<START-1>': [['<main>']],
'<main>': [['<main-1>']],
'<main-1>': [['<parse_expr>']],
'<parse_expr>': [['<parse_expr-0-c>']],
'<parse_expr-0-c>': [['<parse_expr-1>'], ['<parse_expr-3>']],
'<parse_expr-1>': [['<parse_expr-1-s>', '<parse_expr-2>']],
'<parse_expr-3>': [['<parse_expr:while_1 = [1.0]>']],
'<parse_expr-1-s>': [['<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>'],
['<parse_expr:while_1 = [1.0]>',
'<parse_expr:while_1 - [2.0]>',
'<parse_expr-1-s>']],
'<parse_expr-2>': [['<parse_expr:while_1 = [1.0]>']],
'<parse_expr:while_1 = [1.0]>': [['<parse_expr:while_1 = [1.0]-0-c>']],
'<parse_expr:while_1 - [2.0]>': [['<parse_expr:while_1 - [2.0]-0-c>']],
'<parse_expr:while_1 = [1.0]-0-c>': [['<parse_expr:while_1 = [1.0]-1>'],
['<parse_expr:while_1 = [1.0]-2>']],
'<parse_expr:while_1 = [1.0]-1>': [['<parse_expr:if_1 = 2#[1.0, -1]>']],
'<parse_expr:while_1 = [1.0]-2>': [['<parse_expr:if_1 = 0#[1.0, -1]>']],
'<parse_expr:if_1 = 2#[1.0, -1]>': [['<parse_expr:if_1 = 2#[1.0, -1]-1>']],
'<parse_expr:if_1 = 2#[1.0, -1]-1>': [['<parse_paren>']],
'<parse_paren>': [['<parse_paren-1>']],
'<parse_paren-1>': [['(', '<parse_paren-2>']],
'<parse_paren-2>': [['<parse_expr>', '<parse_paren-3>']],
'<parse_paren-3>': [[')']],
'<parse_expr:if_1 = 0#[1.0, -1]>': [['<parse_expr:if_1 = 0#[1.0, -1]-1>']],
'<parse_expr:if_1 = 0#[1.0, -1]-1>': [['<parse_num>']],
'<parse_num>': [['<parse_num-1>']],
'<parse_num-1>': [['<parse_num-1-s>']],
'<parse_num-1-s>': [['<is_digit>'], ['<is_digit>', '<parse_num-1-s>']],
'<is_digit>': [['<is_digit-0-c>']],
'<is_digit-0-c>': [['<is_digit-10>'],
['<is_digit-1>'],
['<is_digit-2>'],
['<is_digit-3>'],
['<is_digit-4>'],
['<is_digit-5>'],
['<is_digit-6>'],
['<is_digit-7>'],
['<is_digit-8>'],
['<is_digit-9>']],
'<is_digit-10>': [['4']],
'<is_digit-1>': [['1']],
'<is_digit-2>': [['6']],
'<is_digit-3>': [['9']],
'<is_digit-4>': [['2']],
'<is_digit-5>': [['3']],
'<is_digit-6>': [['8']],
'<is_digit-7>': [['0']],
'<is_digit-8>': [['5']],
'<is_digit-9>': [['7']],
'<parse_expr:while_1 - [2.0]-0-c>': [['<parse_expr:while_1 - [2.0]-1>'],
['<parse_expr:while_1 - [2.0]-2>'],
['<parse_expr:while_1 - [2.0]-3>'],
['<parse_expr:while_1 - [2.0]-4>']],
'<parse_expr:while_1 - [2.0]-1>': [['/']],
'<parse_expr:while_1 - [2.0]-2>': [['-']],
'<parse_expr:while_1 - [2.0]-3>': [['*']],
'<parse_expr:while_1 - [2.0]-4>': [['+']]}
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_mathexpr_grammar), start_symbol='<START>')
for i in range(10):
print(gf.fuzz())
# )]
152 510+0 * 0 402 *442 311 134+ 3 1+3 003 405+1 0 4 * 2+443 *2
xxxxxxxxxx
%top collapsed_mathexpr_grammar = collapse_rules(ne_mathexpr_grammar)
%top show_grammar(collapsed_mathexpr_grammar)
{'<START>': [['<START-1>']],
'<START-1>': [['<main>']],
'<main>': [['<main-1>']],
'<main-1>': [['<getValue>']],
'<getValue>': [['<getValue-1>']],
'<getValue-1>': [['<parseExpression>']],
'<parseExpression>': [['<parseExpression-1>']],
'<parseExpression-1>': [['<parseAddition>']],
'<parseAddition>': [['<parseAddition-1>']],
'<parseAddition-1>': [['<parseMultiplication>'],
['<parseMultiplication>', '<parseAddition-2>']],
'<parseMultiplication>': [['<parseMultiplication-1>']],
'<parseAddition-2>': [['<parseAddition:while_1 - [1.0]>']],
'<parseMultiplication-1>': [['<parseParenthesis>'],
['<parseParenthesis>', '<parseMultiplication-2>']],
'<parseParenthesis>': [['<parseParenthesis-0-c>']],
'<parseMultiplication-2>': [['<parseMultiplication:while_1 - [1.0]>']],
'<parseParenthesis-0-c>': [['<parseParenthesis-1>'],
['<parseParenthesis-3>']],
'<parseParenthesis-1>': [['<skipWhitespace>', '<parseParenthesis-2>']],
'<parseParenthesis-3>': [['<parseParenthesis:if_1 = 1#[-1]>']],
'<skipWhitespace>': [['<skipWhitespace-1>']],
'<parseParenthesis-2>': [['<parseParenthesis:if_1 = 1#[-1]>']],
'<skipWhitespace-1>': [['<skipWhitespace:while_1 - [1.0]>']],
'<skipWhitespace:while_1 - [1.0]>': [['<skipWhitespace:while_1 - [1.0]-1>']],
'<skipWhitespace:while_1 - [1.0]-1>': [[' ']],
'<parseParenthesis:if_1 = 1#[-1]>': [['<parseParenthesis:if_1 = 1#[-1]-1>']],
'<parseParenthesis:if_1 = 1#[-1]-1>': [['<parseNegative>']],
'<parseNegative>': [['<parseNegative-1>']],
'<parseNegative-1>': [['<parseNegative:if_1 = 1#[-1]>']],
'<parseNegative:if_1 = 1#[-1]>': [['<parseNegative:if_1 = 1#[-1]-1>']],
'<parseNegative:if_1 = 1#[-1]-1>': [['<parseValue>']],
'<parseValue>': [['<parseValue-1>']],
'<parseValue-1>': [['<parseValue:if_1 = 0#[-1]>']],
'<parseValue:if_1 = 0#[-1]>': [['<parseValue:if_1 = 0#[-1]-1>']],
'<parseValue:if_1 = 0#[-1]-1>': [['<parseNumber>']],
'<parseNumber>': [['<parseNumber-1>']],
'<parseNumber-1>': [['<parseNumber-1-s>']],
'<parseNumber-1-s>': [['<parseNumber:while_1 - [1.0]>'],
['<parseNumber:while_1 - [1.0]>', '<parseNumber-1-s>']],
'<parseNumber:while_1 - [1.0]>': [['<parseNumber:while_1 - [1.0]-0-c>']],
'<parseNumber:while_1 - [1.0]-0-c>': [['<parseNumber:while_1 - [1.0]-1>'],
['<parseNumber:while_1 - [1.0]-2>'],
['<parseNumber:while_1 - [1.0]-3>'],
['<parseNumber:while_1 - [1.0]-4>'],
['<parseNumber:while_1 - [1.0]-5>'],
['<parseNumber:while_1 - [1.0]-6>']],
'<parseNumber:while_1 - [1.0]-1>': [['1']],
'<parseNumber:while_1 - [1.0]-2>': [['2']],
'<parseNumber:while_1 - [1.0]-3>': [['0']],
'<parseNumber:while_1 - [1.0]-4>': [['3']],
'<parseNumber:while_1 - [1.0]-5>': [['5']],
'<parseNumber:while_1 - [1.0]-6>': [['4']],
'<parseMultiplication:while_1 - [1.0]>': [['<parseMultiplication:while_1 - [1.0]-1>']],
'<parseMultiplication:while_1 - [1.0]-1>': [['<skipWhitespace>'],
['<skipWhitespace>', '<parseMultiplication:while_1 - [1.0]-2>']],
'<parseMultiplication:while_1 - [1.0]-2>': [['*',
'<parseMultiplication:while_1 - [1.0]-3>']],
'<parseMultiplication:while_1 - [1.0]-3>': [['<parseMultiplication:if_1 = 0#[1.0, -1]>']],
'<parseMultiplication:if_1 = 0#[1.0, -1]>': [['<parseMultiplication:if_1 = 0#[1.0, -1]-1>']],
'<parseMultiplication:if_1 = 0#[1.0, -1]-1>': [['<parseParenthesis>']],
'<parseAddition:while_1 - [1.0]>': [['<parseAddition:while_1 - [1.0]-1>']],
'<parseAddition:while_1 - [1.0]-1>': [['+',
'<parseAddition:while_1 - [1.0]-2>']],
'<parseAddition:while_1 - [1.0]-2>': [['<parseAddition:if_1 = 0#[1.0, -1]>']],
'<parseAddition:if_1 = 0#[1.0, -1]>': [['<parseAddition:if_1 = 0#[1.0, -1]-1>']],
'<parseAddition:if_1 = 0#[1.0, -1]-1>': [['<parseMultiplication>']]}
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(collapsed_mathexpr_grammar), start_symbol='<START>')
for i in range(10):
print(gf.fuzz())
# )]
404 5052 154+ 3 24 *5+5 222 * 224+ 01 05+252 255 2 1+ 0110 42
xxxxxxxxxx
%%top
# [(
gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(collapsed_calc_grammar), start_symbol='<START>')
for i in range(10):
print(gf.fuzz())
# )]
00/(((7/99+((6)/(6/(1)))+7))) (1)+415748/8 8 451 ((0273-948)) (7-5*0)*(8+1+6*2) 1 67+(5)*3 ((91/4)) (9*1*(53))*(6/1)*182
xxxxxxxxxx
def convert_spaces(grammar):
keys = {key: key.replace(' ', '_') for key in grammar}
new_grammar = {}
for key in grammar:
new_alt = []
for rule in grammar[key]:
new_rule = []
for t in rule:
for k in keys:
t = t.replace(k, keys[k])
new_rule.append(t)
new_alt.append(''.join(new_rule))
new_grammar[keys[key]] = new_alt
return new_grammar
xxxxxxxxxx
%top calc_grammar = convert_spaces(collapsed_calc_grammar)
%top show_grammar(calc_grammar, canonical=False)
{'<START>': ['<START-1>'],
'<START-1>': ['<main>'],
'<main>': ['<main-1>'],
'<main-1>': ['<parse_expr>'],
'<parse_expr>': ['<parse_expr-0-c>'],
'<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-3>'],
'<parse_expr-1>': ['<parse_expr-1-s><parse_expr-2>'],
'<parse_expr-3>': ['<parse_expr:while_1_=_[1.0]>'],
'<parse_expr-1-s>': ['<parse_expr:while_1_=_[1.0]><parse_expr:while_1_-_[2.0]>',
'<parse_expr:while_1_=_[1.0]><parse_expr:while_1_-_[2.0]><parse_expr-1-s>'],
'<parse_expr-2>': ['<parse_expr:while_1_=_[1.0]>'],
'<parse_expr:while_1_=_[1.0]>': ['<parse_expr:while_1_=_[1.0]-0-c>'],
'<parse_expr:while_1_-_[2.0]>': ['<parse_expr:while_1_-_[2.0]-0-c>'],
'<parse_expr:while_1_=_[1.0]-0-c>': ['<parse_expr:while_1_=_[1.0]-1>',
'<parse_expr:while_1_=_[1.0]-2>'],
'<parse_expr:while_1_=_[1.0]-1>': ['<parse_expr:if_1_=_2#[1.0,_-1]>'],
'<parse_expr:while_1_=_[1.0]-2>': ['<parse_expr:if_1_=_0#[1.0,_-1]>'],
'<parse_expr:if_1_=_2#[1.0,_-1]>': ['<parse_expr:if_1_=_2#[1.0,_-1]-1>'],
'<parse_expr:if_1_=_2#[1.0,_-1]-1>': ['<parse_paren>'],
'<parse_paren>': ['<parse_paren-1>'],
'<parse_paren-1>': ['(<parse_paren-2>'],
'<parse_paren-2>': ['<parse_expr><parse_paren-3>'],
'<parse_paren-3>': [')'],
'<parse_expr:if_1_=_0#[1.0,_-1]>': ['<parse_expr:if_1_=_0#[1.0,_-1]-1>'],
'<parse_expr:if_1_=_0#[1.0,_-1]-1>': ['<parse_num>'],
'<parse_num>': ['<parse_num-1>'],
'<parse_num-1>': ['<parse_num-1-s>'],
'<parse_num-1-s>': ['<is_digit>', '<is_digit><parse_num-1-s>'],
'<is_digit>': ['<is_digit-0-c>'],
'<is_digit-0-c>': ['<is_digit-10>',
'<is_digit-1>',
'<is_digit-2>',
'<is_digit-3>',
'<is_digit-4>',
'<is_digit-5>',
'<is_digit-6>',
'<is_digit-7>',
'<is_digit-8>',
'<is_digit-9>'],
'<is_digit-10>': ['4'],
'<is_digit-1>': ['1'],
'<is_digit-2>': ['6'],
'<is_digit-3>': ['9'],
'<is_digit-4>': ['2'],
'<is_digit-5>': ['3'],
'<is_digit-6>': ['8'],
'<is_digit-7>': ['0'],
'<is_digit-8>': ['5'],
'<is_digit-9>': ['7'],
'<parse_expr:while_1_-_[2.0]-0-c>': ['<parse_expr:while_1_-_[2.0]-1>',
'<parse_expr:while_1_-_[2.0]-2>',
'<parse_expr:while_1_-_[2.0]-3>',
'<parse_expr:while_1_-_[2.0]-4>'],
'<parse_expr:while_1_-_[2.0]-1>': ['/'],
'<parse_expr:while_1_-_[2.0]-2>': ['-'],
'<parse_expr:while_1_-_[2.0]-3>': ['*'],
'<parse_expr:while_1_-_[2.0]-4>': ['+']}
xxxxxxxxxx
from fuzzingbook import GrammarFuzzer, GrammarMiner, Parser
xxxxxxxxxx
%top gf = GrammarFuzzer.GrammarFuzzer(calc_grammar, start_symbol='<START>')
xxxxxxxxxx
%%top
# [(
for i in range(10):
print(gf.fuzz())
# )]
(9+7+1-6*(8/9*1)) (79*489+(643-(89))-(6)) ((64))-((0)*1)+0/0-0/1*65 (2/95) ((107038*((((959385427839*(8+(53)))/2)-2/920)))+20266) 152+89 09 (3+(434)-(55-9)) ((345)/(((((96*474-(86*(((((5)-66)))*(9+(((20)*(6/(724))+073-((9)))-95)))))))))+36) (((628)))/((3315+4))+8/(78119)
xxxxxxxxxx
def first_in_chain(token, chain):
while True:
if token in chain:
token = chain[token]
else:
break
return token
Return a new symbol for `grammar` based on `symbol_name`.
Return a new symbol for grammar based on
symbol_name.
xxxxxxxxxx
def new_symbol(grammar, symbol_name="<symbol>"):
if symbol_name not in grammar:
return symbol_name
count = 1
while True:
tentative_symbol_name = symbol_name[:-1] + "-" + repr(count) + ">"
if tentative_symbol_name not in grammar:
return tentative_symbol_name
count += 1
Replace keys that have a single token definition with the token in the defition.
Replace keys that have a single token definition with the token in the defition.
xxxxxxxxxx
def replacement_candidates(grammar):
to_replace = {}
for k in grammar:
if len(grammar[k]) != 1: continue
if k in {'<START>', '<main>'}: continue
rule = grammar[k][0]
res = re.findall(RE_NONTERMINAL, rule)
if len(res) == 1:
if len(res[0]) != len(rule): continue
to_replace[k] = first_in_chain(res[0], to_replace)
elif len(res) == 0:
to_replace[k] = first_in_chain(rule, to_replace)
else:
continue # more than one.
return to_replace
xxxxxxxxxx
def replace_key_by_new_key(grammar, keys_to_replace):
new_grammar = {}
for key in grammar:
new_rules = []
for rule in grammar[key]:
for k in keys_to_replace:
new_key = keys_to_replace[k]
rule = rule.replace(k, keys_to_replace[k])
new_rules.append(rule)
new_grammar[keys_to_replace.get(key, key)] = new_rules
assert len(grammar) == len(new_grammar)
return new_grammar
xxxxxxxxxx
def replace_key_by_key(grammar, keys_to_replace):
new_grammar = {}
for key in grammar:
if key in keys_to_replace:
continue
new_rules = []
for rule in grammar[key]:
for k in keys_to_replace:
new_key = keys_to_replace[k]
rule = rule.replace(k, keys_to_replace[k])
new_rules.append(rule)
new_grammar[key] = new_rules
return new_grammar
xxxxxxxxxx
def remove_single_entries(grammar):
keys_to_replace = replacement_candidates(grammar)
return replace_key_by_key(grammar, keys_to_replace)
Remove keys that have similar rules.
Remove keys that have similar rules.
xxxxxxxxxx
def collect_duplicate_rule_keys(grammar):
collect = {}
for k in grammar:
salt = str(sorted(grammar[k]))
if salt not in collect:
collect[salt] = (k, set())
else:
collect[salt][1].add(k)
return collect
xxxxxxxxxx
def remove_duplicate_rule_keys(grammar):
g = grammar
while True:
collect = collect_duplicate_rule_keys(g)
keys_to_replace = {}
for salt in collect:
k, st = collect[salt]
for s in st:
keys_to_replace[s] = k
if not keys_to_replace:
break
g = replace_key_by_key(g, keys_to_replace)
return g
Remove all the control flow vestiges from names, and simply name them sequentially.
Remove all the control flow vestiges from names, and simply name them sequentially.
xxxxxxxxxx
def collect_replacement_keys(grammar):
g = copy.deepcopy(grammar)
to_replace = {}
for k in grammar:
if ':' in k:
first, rest = k.split(':')
sym = new_symbol(g, symbol_name=first + '>')
assert sym not in g
g[sym] = None
to_replace[k] = sym
else:
continue
return to_replace
xxxxxxxxxx
def cleanup_tokens(grammar):
keys_to_replace = collect_replacement_keys(grammar)
g = replace_key_by_new_key(grammar, keys_to_replace)
return g
xxxxxxxxxx
def replaceAngular(grammar):
new_g = {}
replaced = False
for k in grammar:
new_rules = []
for rule in grammar[k]:
new_rule = rule.replace('<>', '<openA><closeA>').replace('</>', '<openA>/<closeA>')
if rule != new_rule:
replaced = True
new_rules.append(new_rule)
new_g[k] = new_rules
if replaced:
new_g['<openA>'] = ['<']
new_g['<closeA>'] = ['<']
return new_g
Remove keys that are referred to only from a single rule, and which have a single alternative.
Import. This can't work on canonical representation. First, given a key, we figure out its distance to `<START>`.
This is different from `remove_single_entries()` in that, there we do not care if the key is being used multiple times. Here, we only replace keys that are referred to only once.
Remove keys that are referred to only from a single rule, and
which have a single alternative. Import. This can't work on
canonical representation. First, given a key, we figure out its
distance to <START>.
This is different from remove_single_entries() in
that, there we do not care if the key is being used multiple times.
Here, we only replace keys that are referred to only once.
xxxxxxxxxx
import math
xxxxxxxxxx
def len_to_start(item, parents, seen=None):
if seen is None: seen = set()
if item in seen:
return math.inf
seen.add(item)
if item == '<START>':
return 0
else:
return 1 + min(len_to_start(p, parents, seen) for p in parents[item])
xxxxxxxxxx
def order_by_length_to_start(items, parents):
return sorted(items, key=lambda i: len_to_start(i, parents))
Next, we generate a map of `child -> [parents]`.
Next, we generate a map of child ->
[parents].
xxxxxxxxxx
def id_parents(grammar, key, seen=None, parents=None):
if parents is None:
parents = {}
seen = set()
if key in seen: return
seen.add(key)
for rule in grammar[key]:
res = re.findall(RE_NONTERMINAL, rule)
for token in res:
if token.startswith('<') and token.endswith('>'):
if token not in parents: parents[token] = list()
parents[token].append(key)
for ckey in {i for i in grammar if i not in seen}:
id_parents(grammar, ckey, seen, parents)
return parents
Now, all together.
Now, all together.
xxxxxxxxxx
def remove_single_alts(grammar, start_symbol='<START>'):
single_alts = {p for p in grammar if len(grammar[p]) == 1 and p != start_symbol}
child_parent_map = id_parents(grammar, start_symbol)
single_refs = {p:child_parent_map[p] for p in single_alts if len(child_parent_map[p]) <= 1}
keys_to_replace = {p:grammar[p][0] for p in order_by_length_to_start(single_refs, child_parent_map)}
g = replace_key_by_key(grammar, keys_to_replace)
return g
xxxxxxxxxx
import os
import hashlib
xxxxxxxxxx
def accio_grammar(fname, src, samples, cache=True):
hash_id = hashlib.md5(json.dumps(samples).encode()).hexdigest()
cache_file = "build/%s_%s_generalized_tree.json" % (fname, hash_id)
if os.path.exists(cache_file) and cache:
with open(cache_file) as f:
generalized_tree = json.load(f)
else:
# regenerate the program
program_src[fname] = src
with open('subjects/%s' % fname, 'w+') as f:
print(src, file=f)
resrc = rewrite(src, fname)
with open('build/%s' % fname, 'w+') as f:
print(resrc, file=f)
os.makedirs('samples/%s' % fname, exist_ok=True)
sample_files = {("samples/%s/%d.csv"%(fname,i)):s for i,s in enumerate(samples)}
for k in sample_files:
with open(k, 'w+') as f:
print(sample_files[k], file=f)
call_trace = []
for i in sample_files:
thash_id = hashlib.md5(json.dumps(sample_files[i]).encode()).hexdigest()
trace_cache_file = "build/%s_%s_trace.json" % (fname, thash_id)
if os.path.exists(trace_cache_file) and cache:
with open(trace_cache_file) as f:
my_tree = f.read()
else:
my_tree = do(["python", "./build/%s" % fname, i]).stdout
with open(trace_cache_file, 'w+') as f:
print(my_tree, file=f)
call_trace.append(json.loads(my_tree)[0])
mined_tree = miner(call_trace)
generalized_tree = generalize_iter(mined_tree)
# costly data structure.
with open(cache_file, 'w+') as f:
json.dump(generalized_tree, f)
g = convert_to_grammar(generalized_tree)
with open('build/%s_grammar_1.json' % fname, 'w+') as f:
json.dump(g, f)
g = check_empty_rules(g)
with open('build/%s_grammar_2.json' % fname, 'w+') as f:
json.dump(g, f)
g = collapse_rules(g) # <- regex learner
with open('build/%s_grammar_3.json' % fname, 'w+') as f:
json.dump(g, f)
g = convert_spaces(g)
with open('build/%s_grammar_4.json' % fname, 'w+') as f:
json.dump(g, f)
e = remove_single_alts(cleanup_tokens(remove_duplicate_rule_keys(remove_single_entries(g))))
e = show_grammar(e, canonical=False)
with open('build/%s_grammar.json' % fname, 'w+') as f:
json.dump(e, f)
return e
xxxxxxxxxx
%top calc_grammar = accio_grammar('calculator.py', VARS['calc_src'], ['(1+2)-2', '11'])
xxxxxxxxxx
%top calc_grammar
{'<START>': ['<parse_expr-1>'],
'<parse_expr-1>': ['<parse_expr-3>',
'<parse_expr-3><parse_expr><parse_expr-3>'],
'<parse_expr-3>': ['(<parse_expr-1>)', '<is_digit-0-c>'],
'<parse_expr>': ['+', '-'],
'<is_digit-0-c>': ['1', '2']}
xxxxxxxxxx
%top gf = GrammarFuzzer.GrammarFuzzer(calc_grammar, start_symbol='<START>')
xxxxxxxxxx
%%top
# [(
for i in range(10):
print(gf.fuzz())
# )]
(2) 2+(2) ((2-((1)))-1)-((2)-(2)) (1)+(((1))-(1+(((2-1))-((1-1)-(((2-(2-1))+2)+(((1-((2-1)))+1)-2)))))) 1+1 2 (((1+(((1-1))-(((1+2)+2))))))-1 1 1-1 (2+((1)))
## Libraries
We need a few instrumented supporting libraries.
We need a few instrumented supporting libraries.
xxxxxxxxxx
%%var myio_src↔
xxxxxxxxxx
# [(
with open('build/myio.py', 'w+') as f:
print(VARS['myio_src'], file=f)
# )]
%%var mylex_src↔
xxxxxxxxxx
# [(
with open('build/mylex.py', 'w+') as f:
print(VARS['mylex_src'], file=f)
# )]
xxxxxxxxxx
import fuzzingbook
xxxxxxxxxx
assert os.path.isfile('json.tar.gz') # for microjson validation
xxxxxxxxxx
Max_Precision = 1000
Max_Recall = 1000
Autogram = {}
AutogramFuzz = {}
AutogramGrammar = {}
Mimid = {}
MimidFuzz = {}
MimidGrammar = {}
MaxTimeout = 60*60 # 60 minutes
MaxParseTimeout = 60*5
CHECK = {'cgidecode','calculator', 'mathexpr', 'urlparse', 'netrc', 'microjson'}
reset_generalizer()
xxxxxxxxxx
def recover_grammar_with_taints(name, src, samples):
header = '''
import fuzzingbook.GrammarMiner
from fuzzingbook.GrammarMiner import Tracer
from fuzzingbook.InformationFlow import ostr
from fuzzingbook.GrammarMiner import TaintedScopedGrammarMiner as TSGM
from fuzzingbook.GrammarMiner import readable
import subjects.autogram_%s
import fuzzingbook
class ostr_new(ostr):
def __new__(cls, value, *args, **kw):
return str.__new__(cls, value)
def __init__(self, value, taint=None, origin=None, **kwargs):
self.taint = taint
if origin is None:
origin = ostr.DEFAULT_ORIGIN
if isinstance(origin, int):
self.origin = list(range(origin, origin + len(self)))
else:
self.origin = origin
#assert len(self.origin) == len(self) <-- bug fix here.
class ostr_new(ostr_new):
def create(self, res, origin=None):
return ostr_new(res, taint=self.taint, origin=origin)
def __repr__(self):
# bugfix here.
return str.__repr__(self)
def recover_grammar_with_taints(fn, inputs, **kwargs):
miner = TSGM()
for inputstr in inputs:
with Tracer(ostr_new(inputstr), **kwargs) as tracer:
fn(tracer.my_input)
miner.update_grammar(tracer.my_input, tracer.trace)
return readable(miner.clean_grammar())
def replaceAngular(grammar):
# special handling for Autogram because it does not look for <> and </>
# in rules, which messes up with parsing.
new_g = {}
replaced = False
for k in grammar:
new_rules = []
for rule in grammar[k]:
new_rule = rule.replace('<>', '<openA><closeA>').replace('</>', '<openA>/<closeA>').replace('<lambda>','<openA>lambda<closeA>')
if rule != new_rule:
replaced = True
new_rules.append(new_rule)
new_g[k] = new_rules
if replaced:
new_g['<openA>'] = ['<']
new_g['<closeA>'] = ['<']
return new_g
def replace_start(grammar):
assert '<start>' in grammar
start = grammar['<start>']
del grammar['<start>']
grammar['<START>'] = start
return replaceAngular(grammar)
samples = [i.strip() for i in [
%s
] if i.strip()]
import json
autogram_grammar_t = recover_grammar_with_taints(subjects.autogram_%s.main, samples)
print(json.dumps(replace_start(autogram_grammar_t)))
'''
mod_name = name.replace('.py','')
with open('./subjects/autogram_%s' % name, 'w+') as f:
print(src, file=f)
with open('./build/autogram_%s' % name, 'w+') as f:
print(header % (mod_name, ',\n'.join([repr(i) for i in samples]), mod_name), file=f)
with ExpectTimeout(MaxTimeout):
result = do(["python","./build/autogram_%s" % name], env={'PYTHONPATH':'.'}, log=True)
if result.stderr.strip():
print(result.stderr, file=sys.stderr)
return show_grammar(json.loads(result.stdout), canonical=False)
return {}
xxxxxxxxxx
from fuzzingbook.Parser import IterativeEarleyParser
### Check Recall
How many of the *valid* inputs from the golden grammar can be recognized by a parser using our grammar?
How many of the valid inputs from the golden grammar can be recognized by a parser using our grammar?
xxxxxxxxxx
def check_recall(golden_grammar, my_grammar, validator, maximum=Max_Recall, log=False):
my_count = maximum
ie = IterativeEarleyParser(my_grammar, start_symbol='<START>')
golden = GrammarFuzzer.GrammarFuzzer(golden_grammar, start_symbol='<START>')
success = 0
while my_count != 0:
src = golden.fuzz()
try:
validator(src)
my_count -= 1
try:
#print('?', repr(src), file=sys.stderr)
for tree in ie.parse(src):
success += 1
break
if log: print(maximum - my_count, '+', repr(src), success, file=sys.stderr)
except:
#print("Error:", sys.exc_info()[0], file=sys.stderr)
if log: print(maximum - my_count, '-', repr(src), file=sys.stderr)
pass
except:
pass
return (success, maximum)
### Check Precision
How many of the inputs produced using our grammar are valid? (Accepted by the program).
How many of the inputs produced using our grammar are valid? (Accepted by the program).
xxxxxxxxxx
def check_precision(name, grammar, maximum=Max_Precision, log=False):
success = 0
with ExpectError():
fuzzer = GrammarFuzzer.GrammarFuzzer(grammar, start_symbol='<START>')
for i in range(maximum):
v = fuzzer.fuzz()
c = check(v, name)
success += (1 if c else 0)
if log: print(i, repr(v), c)
return (success, maximum)
xxxxxxxxxx
from datetime import datetime
xxxxxxxxxx
class timeit():
def __enter__(self):
self.tic = datetime.now()
return self
def __exit__(self, *args, **kwargs):
self.delta = datetime.now() - self.tic
self.runtime = (self.delta.microseconds, self.delta)
xxxxxxxxxx
from fuzzingbook.ExpectError import ExpectError, ExpectTimeout
xxxxxxxxxx
from fuzzingbook.Parser import IterativeEarleyParser
xxxxxxxxxx
def process(s):
# see the rewrite fn. We remove newlines from grammar training to make it easier to visualize
return s.strip().replace('\n', ' ')
xxxxxxxxxx
def check_parse(grammar, inputstrs, start_symbol='<START>'):
count = 0
e = IterativeEarleyParser(grammar, start_symbol=start_symbol)
for s in inputstrs:
with ExpectError():
with ExpectTimeout(MaxParseTimeout):
for tree in e.parse(process(s)):
count += 1
break
return (count, len(inputstrs))
xxxxxxxxxx
from fuzzingbook.ExpectError import ExpectError, ExpectTimeout
xxxxxxxxxx
from fuzzingbook import GrammarFuzzer, Parser
xxxxxxxxxx
def save_grammar(grammar, tool, program):
with open("build/%s-%s.grammar.json" % (tool, program), 'w+') as f:
json.dump(grammar, f)
return {k:sorted(grammar[k]) for k in grammar}
xxxxxxxxxx
import string
xxxxxxxxxx
Mimid_p = {}
Mimid_r = {}
Autogram_p = {}
Autogram_r = {}
Mimid_t ={}
Autogram_t ={}
for k in program_src:
Mimid_p[k] = None
Mimid_r[k] = None
Mimid_t[k] = None
Autogram_p[k] = None
Autogram_r[k] = None
Autogram_t[k] = None
xxxxxxxxxx
import urllib.parse
xxxxxxxxxx
cgidecode_golden = {
"<START>": [
"<cgidecode-s>"
],
"<cgidecode-s>": [
'<cgidecode>',
'<cgidecode><cgidecode-s>'],
"<cgidecode>": [
"<single_char>",
"<percentage_char>"
],
"<single_char>": list(string.ascii_lowercase + string.ascii_uppercase + string.digits + "-./_~"),
"<percentage_char>": [urllib.parse.quote(i) for i in string.punctuation if i not in "-./_~"],
}
xxxxxxxxxx
cgidecode_samples = [↔
]
xxxxxxxxxx
with timeit() as t:
cgidecode_grammar = accio_grammar('cgidecode.py', VARS['cgidecode_src'], cgidecode_samples)
Mimid_t['cgidecode.py'] = t.runtime
xxxxxxxxxx
save_grammar(cgidecode_grammar, 'mimid', 'cgidecode')
{'<START>': ['<cgi_decode-1-s>'],
'<cgi_decode-1-s>': ['<cgi_decode-1>', '<cgi_decode-1><cgi_decode-1-s>'],
'<cgi_decode-1>': ['%<cgi_decode>',
'&',
'+',
'-',
'.',
'/',
'0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
':',
'=',
'?',
'A',
'B',
'C',
'D',
'E',
'F',
'G',
'H',
'I',
'J',
'K',
'L',
'M',
'N',
'O',
'P',
'Q',
'R',
'S',
'T',
'U',
'V',
'W',
'X',
'Y',
'Z',
'_',
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
'o',
'p',
'q',
'r',
's',
't',
'u',
'v',
'w',
'x',
'y',
'z',
'~'],
'<cgi_decode>': ['00',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'2A',
'2B',
'2C',
'2D',
'2E',
'2F',
'2a',
'2b',
'2c',
'2d',
'2e',
'2f',
'3A',
'3B',
'3C',
'3D',
'3E',
'3F',
'3a',
'3b',
'3c',
'3d',
'3e',
'3f',
'40',
'5B',
'5C',
'5D',
'5E',
'5F',
'5b',
'5c',
'5d',
'5e',
'5f',
'60',
'7B',
'7C',
'7D',
'7E',
'7b',
'7c',
'7d',
'7e']}
xxxxxxxxxx
if 'cgidecode' in CHECK:
result = check_precision('cgidecode.py', cgidecode_grammar)
Mimid_p['cgidecode.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
import subjects.cgidecode
xxxxxxxxxx
if 'cgidecode' in CHECK:
result = check_recall(cgidecode_golden, cgidecode_grammar, subjects.cgidecode.main)
Mimid_r['cgidecode.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_cgidecode_grammar_t = recover_grammar_with_taints('cgidecode.py', VARS['cgidecode_src'], cgidecode_samples)
Autogram_t['cgidecode.py'] = t.runtime
CPU times: user 11.8 ms, sys: 8.75 ms, total: 20.6 ms Wall time: 24.9 s
xxxxxxxxxx
save_grammar(autogram_cgidecode_grammar_t, 'autogram_t', 'cgidecode')
{'<START>': ['<create@27:self>'],
'<create@27:self>': ['-',
'1',
'<__init__@15:self>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>+<cgi_decode@19:c><cgi_decode@19:c>+me<cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>zin<cgi_decode@19:c><cgi_decode@19:c>oo<cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c>g',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>e<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c>s=<cgi_decode@19:c><cgi_decode@19:c>1&ma<cgi_decode@19:c><cgi_decode@19:c>=<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>2<cgi_decode@23:digit_low>+2+%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>+<cgi_decode@19:c>&',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>e<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>at<cgi_decode@19:c>s=<cgi_decode@19:c><cgi_decode@19:c>od&status=<cgi_decode@19:c>a<cgi_decode@19:c>p<cgi_decode@19:c>&',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c>l<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c>l<cgi_decode@19:c>%2<cgi_decode@23:digit_low>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>%20<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>%20%23%20<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>en<cgi_decode@19:c>%20%2<cgi_decode@23:digit_low>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%22<cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>J%2<cgi_decode@23:digit_low><cgi_decode@19:c>B%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>N%2<cgi_decode@23:digit_low><cgi_decode@19:c>V%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low>Ae%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>f%2EB',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>h%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c>D%5<cgi_decode@23:digit_low>DR%5<cgi_decode@23:digit_low>c<cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high>b<cgi_decode@19:c><cgi_decode@19:c>%7<cgi_decode@23:digit_low>h<cgi_decode@19:c>%7<cgi_decode@23:digit_low>mB%7e<cgi_decode@19:c>c%7B<cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%7<cgi_decode@23:digit_low>C<cgi_decode@19:c>%7<cgi_decode@23:digit_low><cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>F%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3Ay<cgi_decode@19:c>%3B<cgi_decode@19:c>q%3C<cgi_decode@19:c>',
'<__init__@15:self><cgi_decode@19:c>t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>/t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>t/<cgi_decode@19:c><cgi_decode@19:c>g<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c>p<cgi_decode@19:c><cgi_decode@19:c>seri<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>ob<cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low>%<cgi_decode@23:digit_high>b%2<cgi_decode@23:digit_low>update%20logintable%20set%20pass<cgi_decode@19:c>d%3d%270wn3d%27%3b<cgi_decode@19:c>-%00',
'<__init__@15:self><cgi_decode@19:c>t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>/t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>t/get<cgi_decode@19:c>ata<cgi_decode@19:c>php<cgi_decode@19:c>data<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>cr<cgi_decode@19:c>pt%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>src=%22<cgi_decode@31:t>tp%3a%2<cgi_decode@23:digit_low>%2f',
'<__init__@15:self><cgi_decode@31:t><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c><cgi_decode@19:c>.c<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>a<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>.<cgi_decode@19:c>s%22%<cgi_decode@23:digit_high>e%3c%2fsc<cgi_decode@19:c><cgi_decode@19:c>pt%3e',
'<cgi_decode@19:c>',
'<cgi_decode@23:digit_low>',
'C',
'H',
'O',
'S',
'W',
'a',
'h',
'n',
'w',
'y'],
'<__init__@15:self>': ['<cgi_decode@19:c>', '<cgi_decode@31:t>'],
'<cgi_decode@19:c>': ['%', '+', '<__add__@1115:other>', '<create@27:self>'],
'<cgi_decode@23:digit_high>': ['2',
'3',
'4',
'5',
'6',
'7',
'<cgi_decode@19:c>',
'<cgi_decode@23:digit_low>'],
'<cgi_decode@23:digit_low>': ['0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'<cgi_decode@19:c>',
'<cgi_decode@23:digit_high>',
'A',
'B',
'C',
'D',
'E',
'F',
'a',
'b',
'c',
'd',
'e',
'f'],
'<cgi_decode@31:t>': ['<__add__@1115:other>',
'<__add__@1115:other>w',
'<__add__@1115:self>',
'h<__add__@1115:other>'],
'<__add__@1115:other>': ['&',
'-',
'.',
'/',
'0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
':',
'<__add__@1115:self>',
'<cgi_decode@19:c>',
'<cgi_decode@23:digit_high>',
'<cgi_decode@23:digit_low>',
'=',
'?',
'A',
'B',
'C',
'D',
'E',
'F',
'G',
'H',
'I',
'J',
'K',
'L',
'M',
'N',
'O',
'P',
'Q',
'R',
'S',
'T',
'U',
'V',
'W',
'X',
'Y',
'Z',
'_',
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
'o',
'p',
'q',
'r',
's',
't',
'u',
'v',
'w',
'x',
'y',
'z',
'~'],
'<__add__@1115:self>': ['<__add__@1115:other>', '<create@27:self>']}
xxxxxxxxxx
if 'cgidecode' in CHECK:
result = check_precision('cgidecode.py', autogram_cgidecode_grammar_t)
Autogram_p['cgidecode.py'] = result
print(result)
(460, 1000)
xxxxxxxxxx
if 'cgidecode' in CHECK:
result = check_recall(cgidecode_golden, autogram_cgidecode_grammar_t, subjects.cgidecode.main)
Autogram_r['cgidecode.py'] = result
print(result)
(380, 1000)
xxxxxxxxxx
calc_golden = {
"<START>": [
"<expr>"
],
"<expr>": [
"<term>+<expr>",
"<term>-<expr>",
"<term>"
],
"<term>": [
"<factor>*<term>",
"<factor>/<term>",
"<factor>"
],
"<factor>": [
"(<expr>)",
"<number>"
],
"<number>": [
"<integer>.<integer>",
"<integer>"
],
"<integer>": [
"<digit><integer>",
"<digit>"
],
"<digit>": [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ]
}
xxxxxxxxxx
calc_samples=[i.strip() for i in '''\
(1+2)*3/(423-334+9983)-5-((6)-(701))
(123+133*(12-3)/9+8)+33
(100)
21*3
33/44+2
100
23*234*22*4
(123+133*(12-3)/9+8)+33
1+2
31/20-2
555+(234-445)
1-(41/2)
443-334+33-222
'''.split('\n') if i.strip()]
xxxxxxxxxx
%%time
with timeit() as t:
calc_grammar = accio_grammar('calculator.py', VARS['calc_src'], calc_samples)
Mimid_t['calculator.py'] = t.runtime
CPU times: user 332 ms, sys: 387 ms, total: 719 ms Wall time: 6.72 s
xxxxxxxxxx
save_grammar(calc_grammar, 'mimid', 'calculator')
{'<START>': ['<parse_expr-0-c>'],
'<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-2-s><parse_expr-1>'],
'<parse_expr-1>': ['(<parse_expr-0-c>)', '<parse_num-1-s>'],
'<parse_expr-2-s>': ['<parse_expr-1><parse_expr>',
'<parse_expr-1><parse_expr><parse_expr-2-s>'],
'<parse_num-1-s>': ['<is_digit-0-c>', '<is_digit-0-c><parse_num-1-s>'],
'<is_digit-0-c>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
'<parse_expr>': ['*', '+', '-', '/']}
xxxxxxxxxx
if 'calculator' in CHECK:
result = check_precision('calculator.py', calc_grammar)
Mimid_p['calculator.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
import subjects.calculator
xxxxxxxxxx
if 'calculator' in CHECK:
result = check_recall(calc_golden, calc_grammar, subjects.calculator.main)
Mimid_r['calculator.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_calc_grammar_t = recover_grammar_with_taints('calculator.py', VARS['calc_src'], calc_samples)
Autogram_t['calculator.py'] = t.runtime
CPU times: user 9.24 ms, sys: 6.19 ms, total: 15.4 ms Wall time: 6.55 s
xxxxxxxxxx
save_grammar(autogram_calc_grammar_t, 'autogram_t', 'calculator')
{'<START>': ['<__init__@15:self>'],
'<__init__@15:self>': ['<parse_expr@26:c>00',
'<parse_expr@26:c>1+2)<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>(423<parse_expr@26:c>334+9983)-<parse_expr@26:c>-((6)-(701))',
'<parse_expr@26:c>100)',
'<parse_expr@26:c>12<parse_expr@26:c><parse_expr@26:c>1<parse_expr@29:num>*(12-3)/9+8)+33',
'<parse_expr@26:c>1<parse_expr@26:c><parse_expr@26:c>',
'<parse_expr@26:c>1<parse_expr@26:c><parse_expr@26:c>0<parse_expr@26:c>2',
'<parse_expr@26:c>3<parse_expr@26:c><parse_expr@26:c>4<parse_expr@26:c><parse_expr@26:c>',
'<parse_expr@26:c>3<parse_expr@26:c><parse_expr@29:num><parse_expr@26:c>*<parse_expr@29:num>*4',
'<parse_expr@26:c>4<parse_expr@26:c><parse_expr@26:c><parse_expr@29:num><parse_expr@26:c>33-<parse_expr@26:c>22',
'<parse_expr@26:c>55<parse_expr@26:c><parse_expr@26:c>234-445)',
'<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>',
'<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>41/2)'],
'<parse_expr@26:c>': ['(',
'*',
'+',
'-',
'/',
'1',
'2',
'3',
'4',
'5',
'<parse_expr@29:num>'],
'<parse_expr@29:num>': ['1', '2', '22', '23', '3', '33', '4', '5']}
xxxxxxxxxx
if 'calculator' in CHECK:
result = check_precision('calculator.py', autogram_calc_grammar_t)
Autogram_p['calculator.py'] = result
print(result)
(395, 1000)
xxxxxxxxxx
if 'calculator' in CHECK:
result = check_recall(calc_golden, autogram_calc_grammar_t, subjects.calculator.main)
Autogram_r['calculator.py'] = result
print(result)
(1, 1000)
xxxxxxxxxx
mathexpr_golden = {
"<START>": [
"<expr>"
],
"<word>": [
"pi",
"e",
"phi",
"abs",
"acos",
"asin",
"atan",
"atan2",
"ceil",
"cos",
"cosh",
"degrees",
"exp",
"fabs",
"floor",
"fmod",
"frexp",
"hypot",
"ldexp",
"log",
"log10",
"modf",
"pow",
"radians",
"sin",
"sinh",
"sqrt",
"tan",
"tanh",
"<alpha>"
],
"<alpha>": [ "a", "b", "c", "d", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"],
"<expr>": [
"<term>+<expr>",
"<term>-<expr>",
"<term>"
],
"<term>": [
"<factor>*<term>",
"<factor>/<term>",
"<factor>"
],
"<factor>": [
"+<factor>",
"-<factor>",
"(<expr>)",
"<word>(<expr>,<expr>)",
"<word>",
"<number>"
],
"<number>": [
"<integer>.<integer>",
"<integer>"
],
"<integer>": [
"<digit><integer>",
"<digit>"
],
"<digit>": [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ]
}
xxxxxxxxxx
mathexpr_samples=[i.strip() for i in '''
(pi*e+2)*3/(423-334+9983)-5-((6)-(701-x))
(123+133*(12-3)/9+8)+33
(100)
pi * e
(1 - 1 + -1) * pi
1.0 / 3 * 6
(x + e * 10) / 10
(a + b) / c
1 + pi / 4
(1-2)/3.0 + 0.0000
-(1 + 2) * 3
(1 + 2) * 3
100
1 + 2 * 3
23*234*22*4
(123+133*(12-3)/9+8)+33
1+2
31/20-2
555+(234-445)
1-(41/2)
443-334+33-222
cos(x+4*3) + 2 * 3
exp(0)
-(1 + 2) * 3
(1-2)/3.0 + 0.0000
abs(-2) + pi / 4
(pi + e * 10) / 10
1.0 / 3 * 6
cos(pi) * 1
atan2(2, 1)
hypot(5, 12)
pow(3, 5)
'''.strip().split('\n') if i.strip()]
xxxxxxxxxx
%%time
with timeit() as t:
mathexpr_grammar = accio_grammar('mathexpr.py', VARS['mathexpr_src'], mathexpr_samples, cache=False)
Mimid_t['mathexpr.py'] = t.runtime
CPU times: user 1.03 s, sys: 922 ms, total: 1.95 s Wall time: 17 s
xxxxxxxxxx
save_grammar(mathexpr_grammar, 'mimid', 'mathexpr')
{'<START>': ['<parseAddition-1>'],
'<parseAddition-1>': ['<parseMultiplication-1>',
'<parseMultiplication-1><parseAddition-2-s>'],
'<parseMultiplication-1>': ['<parseParenthesis-0-c>',
'<parseParenthesis-0-c><parseMultiplication-2-s>'],
'<parseAddition-2-s>': ['<parseAddition>',
'<parseAddition><parseAddition-2-s>'],
'<parseParenthesis-0-c>': [' <parseNegative-0-c>',
'(<parseAddition-1>)',
'<parseNegative-0-c>'],
'<parseMultiplication-2-s>': ['<parseMultiplication>',
'<parseMultiplication><parseMultiplication-2-s>'],
'<parseNegative-0-c>': ['-<parseParenthesis-0-c>', '<parseValue-0-c>'],
'<parseValue-0-c>': ['<parseNumber-1-s>', '<parseVariable-0-c>'],
'<parseNumber-1-s>': ['<parseNumber>', '<parseNumber><parseNumber-1-s>'],
'<parseVariable-0-c>': ['a',
'a<parseVariable-9-c>',
'b',
'c',
'co<parseVariable-11>',
'e',
'exp<parseArguments-1>',
'hypot<parseArguments-1>',
'p<parseVariable-1-c>',
'x'],
'<parseNumber>': ['.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
'<parseVariable-9-c>': ['b<parseVariable-11>', 'tan2<parseArguments-1>'],
'<parseVariable-11>': ['s<parseArguments-1>'],
'<parseArguments-1>': ['(<parseAddition-1><parseArguments-2-c>'],
'<parseVariable-1-c>': ['i', 'ow<parseArguments-1>'],
'<parseArguments-2-c>': [')', ', <parseAddition-1>)'],
'<parseMultiplication>': ['<parseMultiplication-2>',
'<parseMultiplication-3>',
'<parseMultiplication-5>'],
'<parseMultiplication-2>': ['*<parseParenthesis-0-c>'],
'<parseMultiplication-3>': [' ', ' <parseMultiplication-4>'],
'<parseMultiplication-5>': ['/<parseParenthesis-0-c>'],
'<parseMultiplication-4>': ['<parseMultiplication-2>',
'<parseMultiplication-5>'],
'<parseAddition>': ['+<parseMultiplication-1>', '-<parseMultiplication-1>']}
xxxxxxxxxx
if 'mathexpr' in CHECK:
result = check_precision('mathexpr.py', mathexpr_grammar)
Mimid_p['mathexpr.py'] = result
print(result)
(699, 1000)
xxxxxxxxxx
import subjects.mathexpr
xxxxxxxxxx
if 'mathexpr' in CHECK:
result = check_recall(mathexpr_golden, mathexpr_grammar, subjects.mathexpr.main)
Mimid_r['mathexpr.py'] = result
print(result)
(922, 1000)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_mathexpr_grammar_t = recover_grammar_with_taints('mathexpr.py', VARS['mathexpr_src'], mathexpr_samples)
Autogram_t['mathexpr.py'] = t.runtime
CPU times: user 11.4 ms, sys: 9.45 ms, total: 20.8 ms Wall time: 26.5 s
xxxxxxxxxx
save_grammar(autogram_mathexpr_grammar_t, 'autogram_t', 'mathexpr')
{'<START>': ['<hasnext@61:self.string>'],
'<hasnext@61:self.string>': ['<parseparenthesis@137:char> <parsemultiplication@111:char> 2 * 3',
'<parseparenthesis@137:char> <parsemultiplication@111:char> pi / 4',
'<parseparenthesis@137:char>(1 + 2) <parsemultiplication@111:char> 3',
'<parseparenthesis@137:char>.0 <parsemultiplication@111:char> 3 <parsemultiplication@111:char> 6',
'<parseparenthesis@137:char>00',
'<parseparenthesis@137:char>1 + 2) <parsemultiplication@111:char> 3',
'<parseparenthesis@137:char>1 - 1 + -1) <parsemultiplication@111:char> pi',
'<parseparenthesis@137:char>1-2)<parsemultiplication@111:char>3.0 <parsemultiplication@111:char> 0.0000',
'<parseparenthesis@137:char>100)',
'<parseparenthesis@137:char>123<parsemultiplication@111:char>133*(12-3)/9+8)+33',
'<parseparenthesis@137:char>1<parsemultiplication@111:char>20<parsemultiplication@111:char>2',
'<parseparenthesis@137:char>3<parsemultiplication@111:char>234*22*4',
'<parseparenthesis@137:char>43<parsemultiplication@111:char>334<parseaddition@93:char>33-222',
'<parseparenthesis@137:char>55<parsemultiplication@111:char>(234-445)',
'<parseparenthesis@137:char><parsemultiplication@111:char>(41/2)',
'<parseparenthesis@137:char><parsemultiplication@111:char>2',
'<parseparenthesis@137:char>a + b) <parsemultiplication@111:char> c',
'<parseparenthesis@137:char>bs(-2) <parsemultiplication@111:char> pi / 4',
'<parseparenthesis@137:char>i <parsemultiplication@111:char> e',
'<parseparenthesis@137:char>os(pi) <parsemultiplication@111:char> 1',
'<parseparenthesis@137:char>os(x<parsemultiplication@111:char>4*3) + 2 * 3',
'<parseparenthesis@137:char>ow(3, 5)',
'<parseparenthesis@137:char>pi + e * 10) <parsemultiplication@111:char> 10',
'<parseparenthesis@137:char>pi<parsemultiplication@111:char>e+2)*3<parsemultiplication@111:char>(423<parsemultiplication@111:char>334+9983)-5-((6)-(701-x))',
'<parseparenthesis@137:char>tan2(2, 1)',
'<parseparenthesis@137:char>x + e * 10) <parsemultiplication@111:char> 10',
'<parseparenthesis@137:char>xp(0)',
'<parseparenthesis@137:char>ypot(5, 12)'],
'<parseparenthesis@137:char>': ['(',
'-',
'1',
'2',
'3',
'4',
'5',
'a',
'c',
'e',
'h',
'p'],
'<parsemultiplication@111:char>': ['*', '/', '<parseaddition@93:char>'],
'<parseaddition@93:char>': ['+', '-']}
xxxxxxxxxx
if 'mathexpr' in CHECK:
result = check_precision('mathexpr.py', autogram_mathexpr_grammar_t)
Autogram_p['mathexpr.py'] = result
print(result)
(301, 1000)
xxxxxxxxxx
if 'mathexpr' in CHECK:
result = check_recall(mathexpr_golden, autogram_mathexpr_grammar_t, subjects.mathexpr.main)
Autogram_r['mathexpr.py'] = result
print(result)
(0, 1000)
xxxxxxxxxx
urlparse_golden = {
"<START>": [
"<url>"
],
"<url>": [
"<scheme>://<authority><path><query>"
],
"<scheme>": [
"http",
"https",
"ftp",
"ftps"
],
"<authority>": [
"<host>",
"<host>:<port>",
"<userinfo>@<host>",
"<userinfo>@<host>:<port>"
],
"<user>": [
"user1",
"user2",
"user3",
"user4",
"user5"
],
"<pass>": [
"pass1",
"pass2",
"pass3",
"pass4",
"pass5"
],
"<host>": [
"host1",
"host2",
"host3",
"host4",
"host5"
],
"<port>": [
"<nat>"
],
"<nat>": [
"10",
"20",
"30",
"40",
"50"
],
"<userinfo>": [
"<user>:<pass>"
],
"<path>": [
"",
"/",
"/<id>",
"/<id><path>"
],
"<id>": [
"folder"
],
"<query>": [
"",
"?<params>"
],
"<params>": [
"<param>",
"<param>&<params>"
],
"<param>": [
"<key>=<value>"
],
"<key>": [
"key1",
"key2",
"key3",
"key4"
],
"<value>": [
"value1",
"value2",
"value3",
"value4"
]
}
xxxxxxxxxx
urlparse_samples = [i.strip() for i in '''
http://www.python.org
http://www.python.org#abc
http://www.python.org'
http://www.python.org#abc'
http://www.python.org?q=abc
http://www.python.org/#abc
http://a/b/c/d;p?q#f
https://www.python.org
https://www.python.org#abc
https://www.python.org?q=abc
https://www.python.org/#abc
https://a/b/c/d;p?q#f
http://www.python.org?q=abc
file:///tmp/junk.txt
imap://mail.python.org/mbox1
mms://wms.sys.hinet.net/cts/Drama/09006251100.asf
nfs://server/path/to/file.txt
svn+ssh://svn.zope.org/repos/main/ZConfig/trunk/
git+ssh://git@github.com/user/project.git
file:///tmp/junk.txt
imap://mail.python.org/mbox1
mms://wms.sys.hinet.net/cts/Drama/09006251100.asf
nfs://server/path/to/file.txt
http://www.python.org/#abc
svn+ssh://svn.zope.org/repos/main/ZConfig/trunk/
git+ssh://git@github.com/user/project.git
g:h
http://a/b/c/g
http://a/b/c/g/
http://a/g
http://g
http://a/b/c/g?y
http://a/b/c/g?y/./x
http://a/b/c/d;p?q#f
http://a/b/c/d;p?q#s
http://a/b/c/g#s
http://a/b/c/g#s/./x
http://a/b/c/g?y#s
http://a/b/c/g;x
http://a/b/c/g;x?y#s
http://a/b/c/
http://a/b/
https://www.python.org
http://a/b/g
http://a/
http://a/g
http://a/b/c/d;p?q#f
http://a/../g
g:h
http://a/b/c/g
http://a/b/c/g/
https://www.python.org#abc
http://g
http://a/b/c/g?y
http://a/b/c/d;p?q#s
http://a/b/c/g#s
http://a/b/c/g?y#s
http://a/b/c/g;x
http://a/b/c/g;x?y#s
https://www.python.org?q=abc
https://www.python.org/#abc
http://[::1]:5432/foo/
http://[dead:beef::1]:5432/foo/
http://[dead:beef::]:5432/foo/
http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:5432/foo/
http://[::12.34.56.78]:5432/foo/
http://[::ffff:12.34.56.78]:5432/foo/
http://Test.python.org/foo/
http://12.34.56.78/foo/
http://[::1]/foo/
http://[dead:beef::1]/foo/
https://a/b/c/d;p?q#f
http://[dead:beef::]/foo/
http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]/foo/
http://[::12.34.56.78]/foo/
http://[::ffff:12.34.56.78]/foo/
http://Test.python.org:5432/foo/
http://12.34.56.78:5432/foo/
http://[::1]:5432/foo/
http://[dead:beef::1]:5432/foo/
http://[dead:beef::]:5432/foo/
http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:5432/foo/
'''.strip().split('\n') if i.strip()]
Unfortunately, as we detail in the paper, both the miners are unable to generalize well with the kind of inputs above. The problem is the lack of generalization of string tokens. Hence we use the ones below, which are generated using the _golden grammar_. This is the output of simply using the golden grammar to fuzz and generate 100 inputs. Captured here for deterministic reproduction.
Unfortunately, as we detail in the paper, both the miners are unable to generalize well with the kind of inputs above. The problem is the lack of generalization of string tokens. Hence we use the ones below, which are generated using the golden grammar. This is the output of simply using the golden grammar to fuzz and generate 100 inputs. Captured here for deterministic reproduction.
xxxxxxxxxx
urlparse_samples = [i.strip() for i in '''
https://user4:pass2@host2:30/folder//?key1=value3
ftp://user2:pass5@host2?key3=value1
ftp://host1/folder//
ftp://host4:30/folder
http://user1:pass4@host1/folder
https://user1:pass4@host4
ftp://host3:40/
http://user5:pass3@host1:10/
http://host4:10
ftp://host4/folder//?key4=value2
https://host5/folder
ftp://user4:pass5@host4/folder//folder//folder/
ftp://user5:pass2@host3
https://host2/
https://user4:pass3@host3/folder
http://host5:50
https://host3/folder?key3=value3
http://user5:pass3@host1/folder?key1=value4&key4=value2&key2=value1&key2=value3
https://user4:pass3@host1/folder
http://user3:pass3@host2:40/
ftp://host2/folder?key2=value3
https://user4:pass4@host2:50/folder/
https://user3:pass5@host4?key4=value1
ftp://user3:pass3@host1:40?key1=value3
https://user1:pass1@host3:50
ftps://user2:pass2@host3/
https://host4:30/folder
http://host5/folder/?key2=value2
ftps://host3:10/folder/
ftp://user4:pass4@host5/folder
http://user2:pass2@host4:10/folder//folder//folder/
ftp://host1:10/folder/
ftp://host3?key3=value1&key1=value3
ftp://user5:pass2@host4/folder//
http://host2
ftps://user5:pass3@host3:30
ftp://host5/folder
https://user2:pass2@host4:20/?key2=value4&key1=value2&key3=value3&key3=value2&key4=value3
https://host3/folder//folder//folder
ftp://user2:pass3@host4:50/
ftps://user5:pass5@host4/
ftps://user3:pass3@host5?key3=value3
ftp://host4?key1=value3&key3=value3&key3=value1
https://host3/?key4=value2&key1=value2&key4=value3&key2=value4
ftps://host1/folder//
ftp://host5/folder//
https://user2:pass1@host5:10/folder//
http://user5:pass5@host2:10/folder
https://host5/folder
ftps://user5:pass3@host4:40/?key1=value3
http://user1:pass3@host4/folder//?key4=value4&key3=value3
http://user2:pass2@host5:50/folder?key4=value3&key4=value2
http://host3?key3=value3&key2=value2
https://user3:pass3@host2:20/folder
https://host5/folder?key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2
ftp://user2:pass5@host5:40/?key4=value4
https://user3:pass4@host2:20/
ftps://host3:30/?key3=value1
ftp://host3/folder
ftps://user1:pass1@host5:20/?key3=value1
https://user4:pass5@host3?key4=value2
ftp://host4:40/folder?key3=value1
ftps://host2/folder//folder
https://host2
https://user2:pass5@host5:50?key1=value4&key1=value1&key2=value1&key2=value1
https://user4:pass5@host1/?key1=value2&key1=value1
http://host4:40/folder?key4=value3&key4=value2
http://host1:40
ftps://host3:30/
ftps://host1/folder/?key4=value1&key1=value4
http://user1:pass1@host1:10/folder/?key2=value2&key2=value3&key3=value4
http://host3/folder?key2=value2
ftps://user4:pass3@host3:50/?key1=value4
ftp://host2/folder//folder
ftp://user2:pass4@host4:40/folder?key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1
ftps://user4:pass5@host4:50?key4=value2
https://host3:10
ftp://user1:pass3@host3:10/folder/
ftps://host4:30/
ftp://user4:pass2@host1/folder/?key3=value2&key2=value4&key1=value3&key3=value2
https://host2/folder?key3=value3&key4=value4&key2=value2
ftp://host2:50/?key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1
ftps://user2:pass4@host2/
ftps://host3:40/
ftps://user4:pass5@host2/
ftp://host2:10/?key3=value3&key4=value1
http://host2/folder/?key3=value1&key2=value4
https://host5/folder?key4=value2
https://user3:pass4@host1:20
ftp://user3:pass3@host5/
https://user1:pass4@host5/
https://user3:pass2@host1/folder//
ftps://host5:30?key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3
ftps://user2:pass5@host3:30?key3=value2
ftps://host4:10/?key1=value1&key4=value3
https://host2:30
https://host5:40/folder
http://user2:pass4@host5:50/folder
ftp://user5:pass1@host3:50?key3=value2&key1=value4
ftp://host1/folder//folder
'''.strip().split('\n') if i.strip()]
xxxxxxxxxx
%%time
with timeit() as t:
urlparse_grammar = accio_grammar('urlparse.py', VARS['urlparse_src'], urlparse_samples)
Mimid_t['urlparse.py'] = t.runtime
CPU times: user 356 ms, sys: 265 ms, total: 621 ms Wall time: 6.13 s
xxxxxxxxxx
save_grammar(urlparse_grammar, 'mimid', 'urlparse')
{'<START>': ['<urlparse-1>'],
'<urlparse-1>': ['<urlsplit-1>', '<urlsplit-1>/'],
'<urlsplit-1>': ['<urlsplit-7>', '<urlsplit-7><urlsplit-1-c>'],
'<urlsplit-7>': ['<urlsplit-20>', 'f', 'http:<urlsplit-18>', 'https'],
'<urlsplit-1-c>': ['://<urlsplit-16>',
'host1:40',
'host2',
'host4:10',
'host5:50',
's<urlsplit-8-c>',
'tp://<urlsplit-16>',
'tp<urlsplit-13-c>',
'tps<urlsplit-4-c>'],
'<urlsplit-20>': ['http', 'http://<_splitnetloc-0-c>'],
'<urlsplit-18>': ['//', '//<urlsplit-19>'],
'<_splitnetloc-0-c>': ['host1',
'host1/folder//',
'host1/folder//folder',
'host1:10/folder/',
'host2',
'host2/folder//folder',
'host2:10',
'host2:50',
'host3',
'host3/folder',
'host3/folder//folder//folder',
'host3:10/folder/',
'host3:30',
'host3:40',
'host4',
'host4:10',
'host4:30',
'host4:30/folder',
'host4:40',
'host5',
'host5/folder',
'host5/folder//',
'host5:30',
'host5:40/folder',
'user1:pass1@host1:10',
'user1:pass1@host5:20',
'user1:pass3@host3:10/folder/',
'user1:pass3@host4',
'user1:pass4@host1/folder',
'user1:pass4@host5',
'user2:pass1@host5:10/folder//',
'user2:pass2@host3',
'user2:pass2@host4:10/folder//folder//folder/',
'user2:pass2@host4:20',
'user2:pass2@host5:50',
'user2:pass3@host4:50',
'user2:pass4@host2',
'user2:pass4@host4:40',
'user2:pass4@host5:50/folder',
'user2:pass5@host2',
'user2:pass5@host3:30',
'user2:pass5@host5:40',
'user2:pass5@host5:50',
'user3:pass2@host1/folder//',
'user3:pass3@host1:40',
'user3:pass3@host2:20/folder',
'user3:pass3@host2:40',
'user3:pass3@host5',
'user3:pass4@host2:20',
'user3:pass5@host4',
'user4:pass2@host1',
'user4:pass2@host2:30',
'user4:pass3@host1/folder',
'user4:pass3@host3/folder',
'user4:pass3@host3:50',
'user4:pass4@host2:50/folder/',
'user4:pass4@host5/folder',
'user4:pass5@host1',
'user4:pass5@host2',
'user4:pass5@host3',
'user4:pass5@host4/folder//folder//folder/',
'user4:pass5@host4:50',
'user5:pass1@host3:50',
'user5:pass2@host4/folder//',
'user5:pass3@host1',
'user5:pass3@host1:10',
'user5:pass3@host4:40',
'user5:pass5@host2:10/folder',
'user5:pass5@host4'],
'<urlsplit-19>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit-2>'],
'<urlsplit-2>': ['/folder//?key4=value4&key3=value3',
'/folder/?key2=value2',
'/folder/?key2=value2&key2=value3&key3=value4',
'/folder/?key3=value1&key2=value4',
'/folder?key1=value4&key4=value2&key2=value1&key2=value3',
'/folder?key2=value2',
'/folder?key4=value3&key4=value2',
'?key3=value3&key2=value2'],
'<urlsplit-16>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c>/<urlsplit>'],
'<urlsplit-8-c>': ['://host2',
'://host2:30',
'://host3:10',
'://user1:pass1@host3:50',
'://user1:pass4@host4',
'://user3:pass4@host1:20',
'<urlsplit-9>'],
'<urlsplit-13-c>': ['://user5:pass2@host3', '<urlsplit-9>'],
'<urlsplit-4-c>': ['://<urlsplit-6>', '://user5:pass3@host3:30'],
'<urlsplit>': ['/?key1=value1&key4=value3',
'/?key1=value3',
'/?key1=value4',
'/?key3=value1',
'/folder//?key1=value3',
'/folder//?key4=value2',
'/folder/?key3=value2&key2=value4&key1=value3&key3=value2',
'/folder/?key4=value1&key1=value4',
'/folder?key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2',
'/folder?key2=value3',
'/folder?key3=value1',
'/folder?key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1',
'/folder?key3=value3',
'/folder?key3=value3&key4=value4&key2=value2',
'/folder?key4=value2',
'?key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3',
'?key1=value2&key1=value1',
'?key1=value3',
'?key1=value3&key3=value3&key3=value1',
'?key1=value4&key1=value1&key2=value1&key2=value1',
'?key2=value4&key1=value2&key3=value3&key3=value2&key4=value3',
'?key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1',
'?key3=value1',
'?key3=value1&key1=value3',
'?key3=value2',
'?key3=value2&key1=value4',
'?key3=value3',
'?key3=value3&key4=value1',
'?key4=value1',
'?key4=value2',
'?key4=value2&key1=value2&key4=value3&key2=value4',
'?key4=value4'],
'<urlsplit-9>': ['://<urlsplit-10>'],
'<urlsplit-10>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit>'],
'<urlsplit-6>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit-6-c>'],
'<urlsplit-6-c>': ['/', '<urlsplit>']}
xxxxxxxxxx
if 'urlparse' in CHECK:
result = check_precision('urlparse.py', urlparse_grammar)
Mimid_p['urlparse.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
import subjects.urlparse
xxxxxxxxxx
if 'urlparse' in CHECK:
result = check_recall(urlparse_golden, urlparse_grammar, subjects.urlparse.main)
Mimid_r['urlparse.py'] = result
print(result)
(153, 1000)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_urlparse_grammar_t = recover_grammar_with_taints('urlparse.py', VARS['urlparse_src'], urlparse_samples)
Autogram_t['urlparse.py'] = t.runtime
CPU times: user 12.7 ms, sys: 6.55 ms, total: 19.3 ms Wall time: 48 s
xxxxxxxxxx
save_grammar(autogram_urlparse_grammar_t, 'autogram_t', 'urlparse')
{'<START>': ['<create@27:self>'],
'<create@27:self>': ['<__init__@15:self>:<__init__@1047:self._ostr>',
'<__init__@15:self>:<create@27:self>',
'<__init__@15:self><__init__@1047:self._ostr>',
'<__init__@15:self><__init__@1047:self._ostr>/',
'<__init__@15:self><__init__@1047:self._ostr><urlsplit@434:url>',
'<__init__@15:self><__init__@1047:self._ostr><urlsplit@458:url>',
'<__init__@15:self>?<_split_helper@1259:item>',
'?<_split_helper@1259:item>'],
'<__init__@15:self>': ['//',
'<__new__@1:path>/',
'<__new__@1:scheme>',
'<_split_helper@1259:item>',
'<urlsplit@434:url>/',
'<urlsplit@446:c><urlsplit@446:c><urlsplit@446:c>',
'<urlsplit@446:c><urlsplit@446:c><urlsplit@446:c><urlsplit@446:c>',
'<urlsplit@446:c><urlsplit@446:c>t<urlsplit@446:c><urlsplit@446:c>',
'<urlsplit@458:url>/'],
'<__init__@1047:self._ostr>': ['<__new__@1:netloc>', '<create@27:self>'],
'<urlsplit@434:url>': ['<__new__@1:path>', '<create@27:self>'],
'<urlsplit@458:url>': ['<__new__@1:path>', '<create@27:self>'],
'<_split_helper@1259:item>': ['/', '<__new__@1:path>', '<__new__@1:query>'],
'<__new__@1:path>': ['/',
'/folder',
'/folder/',
'/folder//',
'/folder//folder',
'/folder//folder//folder',
'/folder//folder//folder/',
'<__new__@1:path>'],
'<__new__@1:scheme>': ['<__new__@1:scheme>', 'http'],
'<urlsplit@446:c>': ['f', 'h', 'p', 's', 't'],
'<__new__@1:netloc>': ['host1',
'host1:10',
'host1:40',
'host2',
'host2:10',
'host2:30',
'host2:50',
'host3',
'host3:10',
'host3:30',
'host3:40',
'host4',
'host4:10',
'host4:30',
'host4:40',
'host5',
'host5:30',
'host5:40',
'host5:50',
'user1:pass1@host1:10',
'user1:pass1@host3:50',
'user1:pass1@host5:20',
'user1:pass3@host3:10',
'user1:pass3@host4',
'user1:pass4@host1',
'user1:pass4@host4',
'user1:pass4@host5',
'user2:pass1@host5:10',
'user2:pass2@host3',
'user2:pass2@host4:10',
'user2:pass2@host4:20',
'user2:pass2@host5:50',
'user2:pass3@host4:50',
'user2:pass4@host2',
'user2:pass4@host4:40',
'user2:pass4@host5:50',
'user2:pass5@host2',
'user2:pass5@host3:30',
'user2:pass5@host5:40',
'user2:pass5@host5:50',
'user3:pass2@host1',
'user3:pass3@host1:40',
'user3:pass3@host2:20',
'user3:pass3@host2:40',
'user3:pass3@host5',
'user3:pass4@host1:20',
'user3:pass4@host2:20',
'user3:pass5@host4',
'user4:pass2@host1',
'user4:pass2@host2:30',
'user4:pass3@host1',
'user4:pass3@host3',
'user4:pass3@host3:50',
'user4:pass4@host2:50',
'user4:pass4@host5',
'user4:pass5@host1',
'user4:pass5@host2',
'user4:pass5@host3',
'user4:pass5@host4',
'user4:pass5@host4:50',
'user5:pass1@host3:50',
'user5:pass2@host3',
'user5:pass2@host4',
'user5:pass3@host1',
'user5:pass3@host1:10',
'user5:pass3@host3:30',
'user5:pass3@host4:40',
'user5:pass5@host2:10',
'user5:pass5@host4'],
'<__new__@1:query>': ['<__new__@1:query>',
'key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3',
'key1=value1&key4=value3',
'key1=value2&key1=value1',
'key1=value3',
'key1=value3&key3=value3&key3=value1',
'key1=value4',
'key1=value4&key1=value1&key2=value1&key2=value1',
'key1=value4&key4=value2&key2=value1&key2=value3',
'key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2',
'key2=value2',
'key2=value2&key2=value3&key3=value4',
'key2=value3',
'key2=value4&key1=value2&key3=value3&key3=value2&key4=value3',
'key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1',
'key3=value1',
'key3=value1&key1=value3',
'key3=value1&key2=value4',
'key3=value2',
'key3=value2&key1=value4',
'key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1',
'key3=value2&key2=value4&key1=value3&key3=value2',
'key3=value3',
'key3=value3&key2=value2',
'key3=value3&key4=value1',
'key3=value3&key4=value4&key2=value2',
'key4=value1',
'key4=value1&key1=value4',
'key4=value2',
'key4=value2&key1=value2&key4=value3&key2=value4',
'key4=value3&key4=value2',
'key4=value4',
'key4=value4&key3=value3']}
xxxxxxxxxx
if 'urlparse' in CHECK:
result = check_precision('urlparse.py', autogram_urlparse_grammar_t)
Autogram_p['urlparse.py'] = result
print(result)
(1000, 1000)
xxxxxxxxxx
if 'urlparse' in CHECK:
result = check_recall(urlparse_golden, autogram_urlparse_grammar_t, subjects.urlparse.main)
Autogram_r['urlparse.py'] = result
print(result)
(277, 1000)
xxxxxxxxxx
netrc_golden = {
"<START>": [
"<entries>"
],
"<entries>": [
"<entry><whitespace><entries>",
"<entry>"
],
"<entry>": [
"machine<whitespace><mvalue><whitespace><fills>",
"default<whitespace<whitespace><fills>"
],
"<whitespace>": [
" "
],
"<mvalue>": [
"m1",
"m2",
"m3"
],
"<accvalue>": [
"a1",
"a2",
"a3"
],
"<uservalue>": [
"u1",
"u2",
"u3"
],
"<passvalue>": [
"pwd1",
"pwd2",
"pwd3"
],
"<lvalue>": [
"l1",
"l2",
"l3"
],
"<fills>": [
"<fill>",
"<fill><whitespace><fills>"
],
"<fill>": [
"account<whitespace><accvalue>",
"username<whitespace><uservalue>",
"password<whitespace><passvalue>",
"login<whitespace><lvalue>"
]
}
xxxxxxxxxx
netrc_samples = [i.strip().replace('\n', ' ') for i in [
'''
machine m1 login u1 password pwd1
''','''
machine m2 login u1 password pwd2
''','''
default login u1 password pwd1
''','''
machine m1 login u2 password pwd1
''','''
machine m2 login u2 password pwd2 machine m1 login l1 password pwd1
''','''
machine m1 login u1 password pwd1 machine m2 login l2 password pwd2
''','''
machine m2 password pwd2 login u2
''','''
machine m1 password pwd1 login u1
''','''
machine m2 login u2 password pwd1
''','''
default login u2 password pwd3
''','''
machine m2 login u2 password pwd1 machine m3 login u3 password pwd1 machine m1 login u1 password pwd2
''','''
machine m2 login u2 password pwd3
machine m1 login u1 password pwd1
''','''
default login u1 password pwd3
machine m2 login u1 password pwd1
''','''
machine m1 login l1 password p1
machine m2 login l2 password p2
default login m1 password p1
''']]
As with `urlparse`, we had to use a restricted set of keywords with _netrc_. The below words are produced from fuzzing the golden grammar, captured here for deterministic reproduction.
As with urlparse, we had to use a restricted set of
keywords with netrc. The below words are produced from
fuzzing the golden grammar, captured here for deterministic
reproduction.
xxxxxxxxxx
netrc_samples = [i.strip() for i in '''
machine m1 password pwd3 login l3
machine m1 login l3 account a3 login l1 password pwd2
machine m2 password pwd2
machine m2 password pwd2 account a2
machine m2 password pwd3
machine m2 password pwd1
machine m1 login l3 password pwd1
machine m2 password pwd3
machine m1 password pwd2 account a1 account a2
machine m2 password pwd3
machine m2 account a1 password pwd3
machine m3 login l3 account a2 password pwd3
machine m2 password pwd2 login l3 password pwd2 password pwd2
machine m3 password pwd2 login l3
machine m3 login l3 account a3 account a2 password pwd3
machine m1 password pwd2
machine m2 account a3 password pwd3
machine m3 password pwd2
machine m3 password pwd1 account a1
machine m2 password pwd1
machine m1 account a1 password pwd1
machine m2 login l1 login l2 account a2 login l3 password pwd2 password pwd2 password pwd2
machine m3 account a3 login l3 account a1 password pwd3
machine m1 password pwd1
machine m2 password pwd3
machine m2 password pwd3
machine m1 account a1 password pwd1 account a1 password pwd3
machine m3 password pwd3
machine m3 password pwd2
machine m2 account a1 account a1 account a2 password pwd2 account a1
machine m3 password pwd1 login l2 login l1
machine m1 account a3 account a3 password pwd1 machine m3 password pwd2
machine m1 login l1 password pwd1
machine m3 password pwd2 login l1 machine m1 password pwd2
machine m3 account a2 password pwd1
machine m1 password pwd3
machine m3 login l2 account a2 password pwd2
machine m2 password pwd3 machine m2 account a1 login l3 password pwd3 password pwd2
machine m1 password pwd2
machine m1 password pwd2
machine m1 password pwd2
machine m2 password pwd3 password pwd2
machine m2 login l1 password pwd1 account a1
machine m3 password pwd1
machine m2 password pwd3 password pwd1
machine m1 password pwd3 password pwd3 password pwd1
machine m2 password pwd1 password pwd1
machine m2 login l2 account a3 password pwd3
machine m1 password pwd1
machine m1 account a3 password pwd3 account a2 password pwd2 account a3 account a3 account a3
machine m3 password pwd3 password pwd3 machine m2 password pwd3
machine m2 password pwd2 login l2 login l1
machine m1 login l3 password pwd2
machine m2 login l2 password pwd1
machine m2 account a3 password pwd2
machine m1 account a2 password pwd1
machine m3 login l1 password pwd2 account a2
machine m1 password pwd3
machine m3 password pwd2
machine m1 password pwd3 password pwd3 password pwd1 machine m2 password pwd3
machine m1 account a2 account a1 login l2 password pwd2
machine m1 login l1 password pwd2 password pwd2 login l3
machine m2 password pwd1 password pwd2
machine m1 password pwd3 account a3
machine m1 login l1 login l2 password pwd2
machine m1 account a1 password pwd1 login l2
machine m2 password pwd1 login l3
machine m2 password pwd2 password pwd1 password pwd3
machine m1 password pwd1 account a1 account a2 login l1
machine m1 password pwd3
machine m2 login l3 password pwd3
machine m3 login l2 login l2 password pwd1 login l2
machine m2 password pwd1
machine m1 password pwd1 login l3 account a2 login l3 password pwd1
machine m3 password pwd3
machine m3 password pwd1 account a1
machine m2 login l3 password pwd1 account a3
machine m3 password pwd3
machine m2 password pwd1
machine m1 login l3 password pwd1 password pwd1
machine m3 password pwd3
machine m2 password pwd2 login l3 login l2 login l1 account a1
machine m1 password pwd1
machine m2 password pwd2 login l3
machine m2 password pwd2
machine m2 password pwd1
machine m3 password pwd3
machine m1 password pwd1
machine m2 account a3 password pwd1
machine m2 login l1 password pwd3
machine m3 password pwd2 login l1 machine m2 password pwd1
machine m2 login l2 account a2 password pwd1 login l2 account a1
machine m1 password pwd2
machine m3 login l1 password pwd1
machine m3 account a2 password pwd2
machine m2 login l1 password pwd3 login l2 account a2
machine m3 account a1 password pwd2
machine m3 login l3 login l3 password pwd1 password pwd1
machine m3 password pwd2 password pwd2 password pwd2 account a2
machine m3 password pwd1
'''.strip().split('\n') if i.strip()]
xxxxxxxxxx
%%time
with timeit() as t:
netrc_grammar = accio_grammar('netrc.py', VARS['netrc_src'], netrc_samples)
Mimid_t['netrc.py'] = t.runtime
CPU times: user 1.23 s, sys: 1.72 s, total: 2.95 s Wall time: 30.4 s
xxxxxxxxxx
save_grammar(netrc_grammar, 'mimid', 'netrc')
{'<START>': ['<_parse__netrc-0-c>'],
'<_parse__netrc-0-c>': ['<_parse__netrc-29>',
'machine <_parse__netrc-22><_parse__netrc-59><_parse__netrc-29>',
'machine m2 password pwd3 machine m2 <_parse__netrc-23>password pwd2'],
'<_parse__netrc-29>': ['machine <_parse__netrc-67><_parse__netrc-32>',
'machine <_parse__netrc-67><_parse__netrc-56>'],
'<_parse__netrc-22>': ['m1 ', 'm3 '],
'<_parse__netrc-59>': ['<_parse__netrc-60>',
'<_parse__netrc-64><_parse__netrc-65>'],
'<_parse__netrc-23>': ['account a1 ', 'login l3 ', 'password pwd3 '],
'<_parse__netrc-67>': ['m1 ', 'm2 ', 'm3 '],
'<_parse__netrc-32>': ['<_parse__netrc-34><_parse__netrc-35>',
'<_parse__netrc-47><_parse__netrc-48>',
'<_parse__netrc>'],
'<_parse__netrc-56>': ['<_parse__netrc-8>',
'<_parse__netrc-8><_parse__netrc-56>'],
'<_parse__netrc-34>': ['<_parse__netrc-5>',
'<_parse__netrc-5><_parse__netrc-34>'],
'<_parse__netrc-35>': ['<_parse__netrc-37><_parse__netrc-38>',
'<_parse__netrc>'],
'<_parse__netrc-47>': ['<_parse__netrc-8>',
'<_parse__netrc-8><_parse__netrc-47>'],
'<_parse__netrc-48>': ['<_parse__netrc-16>',
'<_parse__netrc-50><_parse__netrc-51>'],
'<_parse__netrc>': ['password <_parse__netrc-19>'],
'<_parse__netrc-5>': ['account <_parse__netrc-28>',
'login <_parse__netrc-27>'],
'<_parse__netrc-28>': ['a1 ', 'a2 ', 'a3 '],
'<_parse__netrc-27>': ['l1 ', 'l2 ', 'l3 '],
'<_parse__netrc-37>': ['<_parse__netrc-8>',
'<_parse__netrc-8><_parse__netrc-37>'],
'<_parse__netrc-38>': ['<_parse__netrc-16>',
'<_parse__netrc-40><_parse__netrc-41>'],
'<_parse__netrc-8>': ['password <_parse__netrc-2>'],
'<_parse__netrc-2>': ['pwd1', 'pwd1 ', 'pwd2', 'pwd2 ', 'pwd3', 'pwd3 '],
'<_parse__netrc-16>': ['<_parse__netrc>',
'account <_parse__netrc-20>',
'login <_parse__netrc-9>'],
'<_parse__netrc-40>': ['<_parse__netrc-5>',
'<_parse__netrc-5><_parse__netrc-40>'],
'<_parse__netrc-41>': ['<_parse__netrc-16>',
'<_parse__netrc-43><_parse__netrc-45><_parse__netrc-16>'],
'<_parse__netrc-20>': ['a1', 'a2', 'a3'],
'<_parse__netrc-9>': ['l1', 'l2', 'l3'],
'<_parse__netrc-43>': ['<_parse__netrc-8>',
'<_parse__netrc-8><_parse__netrc-43>'],
'<_parse__netrc-45>': ['<_parse__netrc-5>',
'<_parse__netrc-5><_parse__netrc-45>'],
'<_parse__netrc-50>': ['<_parse__netrc-5>',
'<_parse__netrc-5><_parse__netrc-50>'],
'<_parse__netrc-51>': ['<_parse__netrc-16>',
'<_parse__netrc-8><_parse__netrc-16>'],
'<_parse__netrc-19>': ['pwd1', 'pwd2', 'pwd3'],
'<_parse__netrc-60>': ['<_parse__netrc-61>',
'<_parse__netrc-61><_parse__netrc-62>'],
'<_parse__netrc-64>': ['<_parse__netrc-68>',
'<_parse__netrc-68><_parse__netrc-64>'],
'<_parse__netrc-65>': ['<_parse__netrc-4>',
'<_parse__netrc-4><_parse__netrc-65>'],
'<_parse__netrc-61>': ['<_parse__netrc-4>',
'<_parse__netrc-4><_parse__netrc-61>'],
'<_parse__netrc-62>': ['<_parse__netrc-68>',
'<_parse__netrc-68><_parse__netrc-62>'],
'<_parse__netrc-4>': ['password <_parse__netrc-66>'],
'<_parse__netrc-66>': ['pwd1 ', 'pwd2 ', 'pwd3 '],
'<_parse__netrc-68>': ['account a3 ', 'login l1 ']}
xxxxxxxxxx
if 'netrc' in CHECK:
result = check_precision('netrc.py', netrc_grammar)
Mimid_p['netrc.py'] = result
print(result)
(773, 1000)
xxxxxxxxxx
!cp build/mylex.py .
!cp build/myio.py .
xxxxxxxxxx
import subjects.netrc
xxxxxxxxxx
if 'netrc' in CHECK:
result = check_recall(netrc_golden, netrc_grammar, subjects.netrc.main)
Mimid_r['netrc.py'] = result
print(result)
(949, 1000)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_netrc_grammar_t = recover_grammar_with_taints('netrc.py', VARS['netrc_src'], netrc_samples)
Autogram_t['netrc.py'] = t.runtime
CPU times: user 15.5 ms, sys: 8.65 ms, total: 24.2 ms Wall time: 2min 59s
xxxxxxxxxx
save_grammar(autogram_netrc_grammar_t, 'autogram_t', 'netrc')
{'<START>': ['<create@27:self>'],
'<create@27:self>': ['<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> login l3 password pwd1',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <create@27:self>chine <_parse@47:entryname> password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <create@27:self>chine <_parse@47:entryname> password pwd2',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> login <_parse@83:login>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> password pwd2 password pwd2',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account> account <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@83:login>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <create@27:self>chine m2 <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@83:login> password pwd3 password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password <_parse@108:password> password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd1',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd2 password pwd2 <_parse@70:tt> <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 <create@27:self>chine <_parse@47:entryname> password pwd3',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 password <_parse@108:password> <create@27:self>chine <_parse@47:entryname> password pwd3',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> login <_parse@83:login> <_parse@70:tt> <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd1',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd2 login <_parse@83:login>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> login l2 account <_parse@85:account>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> login <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@85:account> login <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd2 password pwd2',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login l2 <_parse@70:tt> <_parse@108:password> login l2',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login l3 <_parse@70:tt> <_parse@108:password> password pwd1',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> account <_parse@85:account> password <_parse@108:password> account a3 account a3 account a3',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> account a1 password <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@83:login> account <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account a1 account <_parse@85:account> <_parse@70:tt> <_parse@108:password> account a1',
'<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account a3 <_parse@70:tt> <_parse@108:password> <create@27:self>chine <_parse@47:entryname> password <_parse@108:password>',
'm',
'ma'],
'<read_token@137:nextchar>': [' ',
'<__add__@1115:other>',
'<create@27:self>'],
'<_parse@47:entryname>': ['m1', 'm2', 'm3'],
'<_parse@70:tt>': ['account', 'login', 'password'],
'<_parse@108:password>': ['pwd1', 'pwd2', 'pwd3'],
'<_parse@83:login>': ['l1', 'l2', 'l3'],
'<_parse@85:account>': ['a1', 'a2', 'a3'],
'<__add__@1115:other>': ['a', 'c', 'e', 'h', 'i', 'n']}
xxxxxxxxxx
if 'netrc' in CHECK:
result = check_precision('netrc.py', autogram_netrc_grammar_t)
Autogram_p['netrc.py'] = result
print(result)
(30, 1000)
xxxxxxxxxx
if 'netrc' in CHECK:
result = check_recall(netrc_golden, autogram_netrc_grammar_t, subjects.netrc.main)
Autogram_r['netrc.py'] = result
print(result)
(773, 1000)
This is done through `json.tar.gz`
This is done through json.tar.gz
xxxxxxxxxx
# json samples
json_samples = [i.strip().replace('\n', ' ') for i in ['''
{"abcd":[],
"efgh":{"y":[],
"pqrstuv": null,
"p": "",
"q":"" ,
"r": "" ,
"float1": 1.0,
"float2":1.0,
"float3":1.0 ,
"float4": 1.0 ,
"_124": {"wx" : null,
"zzyym!!2@@39": [1.1, 2452, 398, {"x":[[4,53,6,[7 ,8,90 ],10]]}]} }
}
''',
'''
{"mykey1": [1, 2, 3], "mykey2": null, "mykey":"'`:{}<>&%[]\\\\^~|$'"}
''','''
{"emptya": [], "emptyh": {}, "emptystr":"", "null":null}
''', '''
[
"JSON Test Pattern pass1",
{"object with 1 member":["array with 1 element"]},
{},
[],
-42,
true,
false,
null,
{
"integer": 1234567890,
"real": -9876.543210,
"e": 0.123456789e-12,
"E": 1.234567890E+34,
"": 23456789012E66,
"zero": 0,
"one": 1,
"space": " ",
"quote": "\\"",
"backslash": "\\\\",
"controls": "\\b\\f\\n\\r\\t",
"slash": "/ & \\/",
"alpha": "abcdefghijklmnopqrstuvwyz",
"ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ",
"digit": "0123456789",
"0123456789": "digit",
"special": "`1~!@#$%^&*()_+-={':[,]}|;.</>?",
"true": true,
"false": false,
"null": null,
"array":[ ],
"object":{ },
"address": "50 St. James Street",
"url": "http://www.JSON.org/",
"comment": "// /* <!-- --",
"# -- --> */": " ",
" s p a c e d " :[1,2 , 3
,
4 , 5 , 6 ,7 ],"compact":[1,2,3,4,5,6,7],
"jsontext": "{\\"object with 1 member\\":[\\"array with 1 element\\"]}",
"quotes": "" %22 0x22 034 "",
"\\/\\\\\\"\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?"
: "A key can be any string"
},
0.5 ,98.6
,
99.44
,
1066,
1e1,
0.1e1,
1e-1,
1e00,2e+00,2e-00
,"rosebud"]
''', '''
{"menu":
{
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}
}
''', '''
{
"XMDEwOlJlcG9zaXRvcnkxODQ2MjA4ODQ=": "-----BEGIN PGP SIGNATURE-----\n\niQIzBAABAQAdFiEESn/54jMNIrGSE6Tp6cQjvhfv7nAFAlnT71cACgkQ6cQjvhfv\n7nCWwA//XVqBKWO0zF+ bZl6pggvky3Oc2j1pNFuRWZ29LXpNuD5WUGXGG209B0hI\nDkmcGk19ZKUTnEUJV2Xd0R7AW01S/YSub7OYcgBkI7qUE13FVHN5ln1KvH2all2n\n2+JCV1HcJLEoTjqIFZSSu/sMdhkLQ9/NsmMAzpf/ iIM0nQOyU4YRex9eD1bYj6nA\nOQPIDdAuaTQj1gFPHYLzM4zJnCqGdRlg0sOM/zC5apBNzIwlgREatOYQSCfCKV7k\nnrU34X8b9BzQaUx48Qa+Dmfn5KQ8dl27RNeWAqlkuWyv3pUauH9UeYW+KyuJeMkU\n+ NyHgAsWFaCFl23kCHThbLStMZOYEnGagrd0hnm1TPS4GJkV4wfYMwnI4KuSlHKB\njHl3Js9vNzEUQipQJbgCgTiWvRJoK3ENwBTMVkKHaqT4x9U4Jk/ XZB6Q8MA09ezJ\n3QgiTjTAGcum9E9QiJqMYdWQPWkaBIRRz5cET6HPB48YNXAAUsfmuYsGrnVLYbG+ \nUpC6I97VybYHTy2O9XSGoaLeMI9CsFn38ycAxxbWagk5mhclNTP5mezIq6wKSwmr\nX11FW3n1J23fWZn5HJMBsRnUCgzqzX3871IqLYHqRJ/bpZ4h20RhTyPj5c/z7QXp\neSakNQMfbbMcljkha+ ZMuVQX1K9aRlVqbmv3ZMWh+OijLYVU2bc=\n=5Io4\n-----END PGP SIGNATURE-----\n"
}
''', '''
{"widget":
{
"debug": "on",
"window": {
"title": "Sample Konfabulator Widget",
"name": "main_window",
"width": 500,
"height": 500
},
"image": {
"src": "Images/Sun.png",
"name": "sun1",
"hOffset": 250,
"vOffset": 250,
"alignment": "center"
},
"text": {
"data": "Click Here",
"size": 36,
"style": "bold",
"name": "text1",
"hOffset": 250,
"vOffset": 100,
"alignment": "center",
"onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
}
}
}
''',
'''
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"product": "Jam"
}
''',
'''
{"menu":
{
"header": "SVG Viewer",
"items": [
{"id": "Open"},
{"id": "OpenNew", "label": "Open New"},
null,
{"id": "ZoomIn", "label": "Zoom In"},
{"id": "ZoomOut", "label": "Zoom Out"},
{"id": "OriginalView", "label": "Original View"},
null,
{"id": "Quality"},
{"id": "Pause"},
{"id": "Mute"},
null,
{"id": "Find", "label": "Find..."},
{"id": "FindAgain", "label": "Find Again"},
{"id": "Copy"},
{"id": "CopyAgain", "label": "Copy Again"},
{"id": "CopySVG", "label": "Copy SVG"},
{"id": "ViewSVG", "label": "View SVG"},
{"id": "ViewSource", "label": "View Source"},
{"id": "SaveAs", "label": "Save As"},
null,
{"id": "Help"},
{"id": "About", "label": "About Adobe CVG Viewer..."}
]
}}
''',
'''
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
}
}
''',
'''
{
"colors":
[
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255,255,255,1],
"hex": "#000"
}
},
{
"color": "white",
"category": "value",
"code": {
"rgba": [0,0,0,1],
"hex": "#FFF"
}
},
{
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255,0,0,1],
"hex": "#FF0"
}
},
{
"color": "blue",
"category": "hue",
"type": "primary",
"code": {
"rgba": [0,0,255,1],
"hex": "#00F"
}
},
{
"color": "yellow",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255,255,0,1],
"hex": "#FF0"
}
},
{
"color": "green",
"category": "hue",
"type": "secondary",
"code": {
"rgba": [0,255,0,1],
"hex": "#0F0"
}
}
]
}
''',
'''
{
"aliceblue": "#f0f8ff",
"antiquewhite": "#faebd7",
"aqua": "#00ffff",
"aquamarine": "#7fffd4",
"azure": "#f0ffff",
"beige": "#f5f5dc",
"bisque": "#ffe4c4",
"black": "#000000",
"blanchedalmond": "#ffebcd",
"blue": "#0000ff",
"blueviolet": "#8a2be2",
"brown": "#a52a2a",
"majenta": "#ff0ff"
}
''']]
xxxxxxxxxx
%%time
with timeit() as t:
microjson_grammar = accio_grammar('microjson.py', VARS['microjson_src'], json_samples)
Mimid_t['microjson.py'] = t.runtime
CPU times: user 1min 39s, sys: 19.1 s, total: 1min 58s Wall time: 6min 47s
xxxxxxxxxx
save_grammar(microjson_grammar, 'mimid', 'microjson')
{'<START>': ['<_from_json_raw>'],
'<_from_json_raw>': ['<_from_json_number-1>',
'<_from_json_raw-3>',
'<_from_json_raw-4>',
'<_from_json_raw-5>',
'<_skip-1-s><_from_json_raw-2>',
'false',
'null',
'true'],
'<_from_json_number-1>': ['<_from_json_number-1-s>',
'<_from_json_number-1-s>e<_from_json_number-3-s>'],
'<_from_json_raw-3>': ['[<_from_json_list-0-c>'],
'<_from_json_raw-4>': ['{<_from_json_dict-0-c>'],
'<_from_json_raw-5>': ['"<_from_json_string-0-c>'],
'<_skip-1-s>': [' ', ' <_skip-1-s>'],
'<_from_json_raw-2>': ['<_from_json_number-1>',
'<_from_json_raw-3>',
'<_from_json_raw-4>',
'<_from_json_raw-5>',
'false',
'null',
'true'],
'<_from_json_number-1-s>': ['<_from_json_number>',
'<_from_json_number><_from_json_number-1-s>'],
'<_from_json_number-3-s>': ['<_from_json_number>',
'<_from_json_number><_from_json_number-3-s>'],
'<_from_json_number>': ['+',
'-',
'.',
'0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'E',
'e'],
'<_from_json_list-0-c>': ['<_from_json_list-10><_from_json_list-1-c>',
'<_from_json_list-12>',
'<_from_json_list-5-s><_from_json_list-6-s><_from_json_list-6-c>'],
'<_from_json_list-10>': ['<_from_json_raw>', '<_skip-1-s><_from_json_raw>'],
'<_from_json_list-1-c>': ['<_from_json_list-12>',
'<_from_json_list-2-s><_from_json_list-12>'],
'<_from_json_list-12>': ['<_skip-1-s>]', ']'],
'<_from_json_list-5-s>': ['<_from_json_list-7>',
'<_from_json_list-7><_from_json_list-5-s>'],
'<_from_json_list-6-s>': ['<_from_json_list>',
'<_from_json_list><_from_json_list-6-s>'],
'<_from_json_list-6-c>': ['<_from_json_list-12>',
'<_from_json_list-8-s><_from_json_list-9-s><_from_json_list-12>'],
'<_from_json_list-2-s>': ['<_from_json_list-7>',
'<_from_json_list-7><_from_json_list-2-s>'],
'<_from_json_list-7>': ['<_from_json_list-3>',
'<_from_json_list-4>',
'<_from_json_raw>'],
'<_from_json_list-3>': [',<_from_json_raw>'],
'<_from_json_list-4>': ['<_skip-1-s><_from_json_list-3>'],
'<_from_json_list>': ['<_from_json_list-3>', '<_from_json_list-4>'],
'<_from_json_list-8-s>': ['<_from_json_list-7>',
'<_from_json_list-7><_from_json_list-8-s>'],
'<_from_json_list-9-s>': ['<_from_json_list>',
'<_from_json_list><_from_json_list-9-s>'],
'<_from_json_dict-0-c>': ['<_from_json_dict-1>',
'<_from_json_dict-3-s><_from_json_dict-1>',
'<_from_json_dict-5>'],
'<_from_json_dict-1>': ['<_from_json_dict><_from_json_dict-5>'],
'<_from_json_dict-3-s>': ['<_from_json_dict><_from_json_dict-7>',
'<_from_json_dict><_from_json_dict-7><_from_json_dict-3-s>'],
'<_from_json_dict-5>': ['<_skip-1-s>}', '}'],
'<_from_json_dict>': ['<_from_json_dict-4>',
'<_skip-1-s><_from_json_dict-4>'],
'<_from_json_dict-4>': ['"<_from_json_string-0-c><_from_json_dict-10>'],
'<_from_json_string-0-c>': ['"', '<_from_json_string-1-s>"'],
'<_from_json_dict-10>': ['<_from_json_dict-12>',
'<_skip-1-s><_from_json_dict-12>'],
'<_from_json_string-1-s>': ['<_from_json_string>',
'<_from_json_string><_from_json_string-1-s>'],
'<_from_json_string>': [' ',
'!',
'#',
'$',
'%',
'&',
"'",
'(',
')',
'*',
'+',
',',
'-',
'.',
'/',
'0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
':',
';',
'<',
'=',
'>',
'?',
'@',
'A',
'B',
'C',
'D',
'E',
'F',
'G',
'H',
'I',
'J',
'K',
'L',
'M',
'N',
'O',
'P',
'Q',
'R',
'S',
'T',
'U',
'V',
'W',
'X',
'Y',
'Z',
'[',
'\\<decode_escape-0-c>',
']',
'^',
'_',
'`',
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
'o',
'p',
'q',
'r',
's',
't',
'u',
'v',
'w',
'x',
'y',
'z',
'{',
'|',
'}',
'~'],
'<decode_escape-0-c>': ['"', '/', '\\', 'b', 'f', 'n', 'r', 't'],
'<_from_json_dict-12>': [':<_from_json_list-10>'],
'<_from_json_dict-7>': [',', '<_skip-1-s>,']}
xxxxxxxxxx
if 'microjson' in CHECK:
result = check_precision('microjson.py', microjson_grammar)
Mimid_p['microjson.py'] = result
print(result)
(924, 1000)
xxxxxxxxxx
import subjects.microjson
xxxxxxxxxx
import pathlib
xxxxxxxxxx
def slurp(fn):
with open(fn) as f:
s = f.read()
return s.replace('\n', ' ').strip()
xxxxxxxxxx
if shutil.which('gzcat'):
!gzcat json.tar.gz | tar -xpf -
elif shutil.which('zcat'):
!zcat json.tar.gz | tar -xpf -
else:
assert False
xxxxxxxxxx
json_path = pathlib.Path('recall/json')
json_files = [i.as_posix() for i in json_path.glob('**/*.json')]
json_samples_2 = [slurp(i) for i in json_files]
xxxxxxxxxx
def check_recall_samples(samples, my_grammar, validator, log=False):
n_max = len(samples)
ie = IterativeEarleyParser(my_grammar, start_symbol='<START>')
my_samples = list(samples)
count = 0
while my_samples:
src, *my_samples = my_samples
try:
validator(src)
try:
# JSON files are much larger because they are from real world
for tree in ie.parse(src):
count += 1
break
if log: print('+', repr(src), count, file=sys.stderr)
except:
if log: print('-', repr(src), file=sys.stderr)
except:
pass
return (count, n_max)
xxxxxxxxxx
if 'microjson' in CHECK:
result = check_recall_samples(json_samples_2, microjson_grammar, subjects.microjson.main)
Mimid_r['microjson.py'] = result
print(result)
(93, 100)
xxxxxxxxxx
%%time
with timeit() as t:
autogram_microjson_grammar_t = recover_grammar_with_taints('microjson.py', VARS['microjson_src'], json_samples)
Autogram_t['microjson.py'] = t.runtime
CPU times: user 16.4 ms, sys: 16.6 ms, total: 33 ms Wall time: 7min 30s
xxxxxxxxxx
save_grammar(autogram_microjson_grammar_t, 'autogram_t', 'microjson')
{'<START>': ['<tell@115:self.buf>'],
'<tell@115:self.buf>': ['<_skip@76:c> "JSON Test Pattern pass1", {"object with 1 member":<_from_json_raw@283:c>"array with 1 element"]}, {}, [], -42, true, false, null, { "integer": 1234567890, "real": -9876.543210, "e": 0.123456789e-12, "E": 1.234567890E+34, "": 23456789012E66, "zero": 0, "one": 1, "space": " ", "quote": "\\"", "backslash": "\\\\", "controls": "\\b\\f\\n\\r\\t", "slash": "/ & \\/", "alpha": "abcdefghijklmnopqrstuvwyz", "ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ", "digit": "0123456789", "0123456789": "digit", "special": "`1~!@#$%^&*()_+-={\':[,]}|;.<openA>/<closeA>?", "true": true, "false": false, "null": null, "array":[ ], "object":{ }, "address": "50 St. James Street", "url": "http://www.JSON.org/", "comment": "// /* <!-- --", "# -- --> */": " ", " s p a c e d " :[1,2 , 3 , 4 , 5 , 6 ,7 ],"compact":[1,2,3,4,5,6,7], "jsontext": "{\\"object with 1 member\\":[\\"array with 1 element\\"]}", "quotes": "" %22 0x22 034 "", "\\/\\\\\\"\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:\',./<openA><closeA>?" : "A key can be any string" }, 0.5 ,98.6 , 99.44 , 1066, 1e1, 0.1e1, 1e-1, 1e00,2e+00,2e-00 ,"rosebud"]',
'<_skip@76:c> "fruit": "<from_json@313:v.fruit>", "size": "<from_json@313:v.size>", "color": "<from_json@313:v.color>", "product": "<from_json@313:v.product>" }',
'<_skip@76:c> "quiz": <_from_json_raw@283:c> "sport": { "q1": { "question": "<from_json@313:v.quiz.sport.q1.question>", "options": [ "New York Bulls", "Los Angeles Kings", "Golden State Warriros", "<from_json@313:v.quiz.sport.q1.answer>" ], "answer": "Huston Rocket" } }, "maths": { "q1": { "question": "<from_json@313:v.quiz.maths.q1.question>", "options": [ "10", "11", "<from_json@313:v.quiz.maths.q1.answer>", "13" ], "answer": "12" }, "q2": { "question": "<from_json@313:v.quiz.maths.q2.question>", "options": [ "1", "2", "3", "<from_json@313:v.quiz.maths.q2.answer>" ], "answer": "4" } } } }',
'<_skip@76:c> "XMDEwOlJlcG9zaXRvcnkxODQ2MjA4ODQ=": "<from_json@313:v.xmdewoljlcg9zaxrvcnkxodq2mja4odq=>" }',
'<_skip@76:c> "aliceblue": "<from_json@313:v.aliceblue>", "antiquewhite": "<from_json@313:v.antiquewhite>", "aqua": "<from_json@313:v.aqua>", "aquamarine": "<from_json@313:v.aquamarine>", "azure": "<from_json@313:v.azure>", "beige": "<from_json@313:v.beige>", "bisque": "<from_json@313:v.bisque>", "black": "<from_json@313:v.black>", "blanchedalmond": "<from_json@313:v.blanchedalmond>", "blue": "<from_json@313:v.blue>", "blueviolet": "<from_json@313:v.blueviolet>", "brown": "<from_json@313:v.brown>", "majenta": "<from_json@313:v.majenta>" }',
'<_skip@76:c> "colors": [ <_from_json_raw@283:c> "color": "black", "category": "hue", "type": "primary", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "white", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": [255,0,0,1], "hex": "#FF0" } }, { "color": "blue", "category": "hue", "type": "primary", "code": { "rgba": [0,0,255,1], "hex": "#00F" } }, { "color": "yellow", "category": "hue", "type": "primary", "code": { "rgba": [255,255,0,1], "hex": "#FF0" } }, { "color": "green", "category": "hue", "type": "secondary", "code": { "rgba": [0,255,0,1], "hex": "#0F0" } } ] }',
'<_skip@76:c>"abcd":[], "efgh":<_from_json_raw@283:c>"y":[], "pqrstuv": <from_json@313:v.efgh._124.wx>, "p": "", "q":"" , "r": "" , "float1": <from_json@313:v.efgh.float4>, "float2":1.0, "float3":1.0 , "float4": 1.0 , "_124": {"wx" : null, "zzyym!!2@@39": [1.1, 2452, 398, {"x":[[4,53,6,[7 ,8,90 ],10]]}]} } }',
'<_skip@76:c>"emptya": [], "emptyh": <_from_json_raw@283:c>}, "emptystr":"", "<from_json@313:v.null>":null}',
'<_skip@76:c>"menu": <_from_json_raw@283:c> "header": "<from_json@313:v.menu.header>", "items": [ {"id": "Open"}, {"id": "OpenNew", "label": "Open New"}, null, {"id": "ZoomIn", "label": "Zoom In"}, {"id": "ZoomOut", "label": "Zoom Out"}, {"id": "OriginalView", "label": "Original View"}, null, {"id": "Quality"}, {"id": "Pause"}, {"id": "Mute"}, null, {"id": "Find", "label": "Find..."}, {"id": "FindAgain", "label": "Find Again"}, {"id": "Copy"}, {"id": "CopyAgain", "label": "Copy Again"}, {"id": "CopySVG", "label": "Copy SVG"}, {"id": "ViewSVG", "label": "View SVG"}, {"id": "ViewSource", "label": "View Source"}, {"id": "SaveAs", "label": "Save As"}, null, {"id": "Help"}, {"id": "About", "label": "About Adobe CVG Viewer..."} ] }}',
'<_skip@76:c>"menu": <_from_json_raw@283:c> "id": "<from_json@313:v.menu.id>", "value": "<from_json@313:v.menu.value>", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } } }',
'<_skip@76:c>"mykey1": [1, 2, 3], "mykey2": <from_json@313:v.mykey2>, "mykey":"\'`:<_from_json_raw@283:c>}<openA><closeA>&%[]\\\\^~|$\'"}',
'<_skip@76:c>"widget": <_from_json_raw@283:c> "debug": "<from_json@313:v.widget.debug>", "window": { "title": "<from_json@313:v.widget.window.title>", "name": "<from_json@313:v.widget.window.name>", "width": <from_json@313:v.widget.window.height>, "height": 500 }, "image": { "src": "<from_json@313:v.widget.image.src>", "name": "<from_json@313:v.widget.image.name>", "hOffset": <from_json@313:v.widget.text.hoffset>, "vOffset": 250, "alignment": "<from_json@313:v.widget.text.alignment>" }, "text": { "data": "<from_json@313:v.widget.text.data>", "size": <from_json@313:v.widget.text.size>, "style": "<from_json@313:v.widget.text.style>", "name": "<from_json@313:v.widget.text.name>", "hOffset": 250, "vOffset": <from_json@313:v.widget.text.voffset>, "alignment": "center", "onMouseUp": "<from_json@313:v.widget.text.onmouseup>" } } }'],
'<_skip@76:c>': ['<<openA>lambda<closeA>@72:c>'],
'<_from_json_raw@283:c>': ['[', '{'],
'<openA>': ['<'],
'<closeA>': ['<'],
'<from_json@313:v.fruit>': ['Apple'],
'<from_json@313:v.size>': ['Large'],
'<from_json@313:v.color>': ['Red'],
'<from_json@313:v.product>': ['Jam'],
'<from_json@313:v.quiz.sport.q1.question>': ['Which one is correct team name in NBA?'],
'<from_json@313:v.quiz.sport.q1.answer>': ['Huston Rocket'],
'<from_json@313:v.quiz.maths.q1.question>': ['5 + 7 = ?'],
'<from_json@313:v.quiz.maths.q1.answer>': ['12'],
'<from_json@313:v.quiz.maths.q2.question>': ['12 - 8 = ?'],
'<from_json@313:v.quiz.maths.q2.answer>': ['4'],
'<from_json@313:v.xmdewoljlcg9zaxrvcnkxodq2mja4odq=>': ['-----BEGIN PGP SIGNATURE----- iQIzBAABAQAdFiEESn/54jMNIrGSE6Tp6cQjvhfv7nAFAlnT71cACgkQ6cQjvhfv 7nCWwA//XVqBKWO0zF+ bZl6pggvky3Oc2j1pNFuRWZ29LXpNuD5WUGXGG209B0hI DkmcGk19ZKUTnEUJV2Xd0R7AW01S/YSub7OYcgBkI7qUE13FVHN5ln1KvH2all2n 2+JCV1HcJLEoTjqIFZSSu/sMdhkLQ9/NsmMAzpf/ iIM0nQOyU4YRex9eD1bYj6nA OQPIDdAuaTQj1gFPHYLzM4zJnCqGdRlg0sOM/zC5apBNzIwlgREatOYQSCfCKV7k nrU34X8b9BzQaUx48Qa+Dmfn5KQ8dl27RNeWAqlkuWyv3pUauH9UeYW+KyuJeMkU + NyHgAsWFaCFl23kCHThbLStMZOYEnGagrd0hnm1TPS4GJkV4wfYMwnI4KuSlHKB jHl3Js9vNzEUQipQJbgCgTiWvRJoK3ENwBTMVkKHaqT4x9U4Jk/ XZB6Q8MA09ezJ 3QgiTjTAGcum9E9QiJqMYdWQPWkaBIRRz5cET6HPB48YNXAAUsfmuYsGrnVLYbG+ UpC6I97VybYHTy2O9XSGoaLeMI9CsFn38ycAxxbWagk5mhclNTP5mezIq6wKSwmr X11FW3n1J23fWZn5HJMBsRnUCgzqzX3871IqLYHqRJ/bpZ4h20RhTyPj5c/z7QXp eSakNQMfbbMcljkha+ ZMuVQX1K9aRlVqbmv3ZMWh+OijLYVU2bc= =5Io4 -----END PGP SIGNATURE----- '],
'<from_json@313:v.aliceblue>': ['#f0f8ff'],
'<from_json@313:v.antiquewhite>': ['#faebd7'],
'<from_json@313:v.aqua>': ['#00ffff'],
'<from_json@313:v.aquamarine>': ['#7fffd4'],
'<from_json@313:v.azure>': ['#f0ffff'],
'<from_json@313:v.beige>': ['#f5f5dc'],
'<from_json@313:v.bisque>': ['#ffe4c4'],
'<from_json@313:v.black>': ['#000000'],
'<from_json@313:v.blanchedalmond>': ['#ffebcd'],
'<from_json@313:v.blue>': ['#0000ff'],
'<from_json@313:v.blueviolet>': ['#8a2be2'],
'<from_json@313:v.brown>': ['#a52a2a'],
'<from_json@313:v.majenta>': ['#ff0ff'],
'<from_json@313:v.efgh._124.wx>': ['null'],
'<from_json@313:v.efgh.float4>': ['1.0'],
'<from_json@313:v.null>': ['null'],
'<from_json@313:v.menu.header>': ['SVG Viewer'],
'<from_json@313:v.menu.id>': ['file'],
'<from_json@313:v.menu.value>': ['File'],
'<from_json@313:v.mykey2>': ['null'],
'<from_json@313:v.widget.debug>': ['on'],
'<from_json@313:v.widget.window.title>': ['Sample Konfabulator Widget'],
'<from_json@313:v.widget.window.name>': ['main_window'],
'<from_json@313:v.widget.window.height>': ['500'],
'<from_json@313:v.widget.image.src>': ['Images/Sun.png'],
'<from_json@313:v.widget.image.name>': ['sun1'],
'<from_json@313:v.widget.text.hoffset>': ['250'],
'<from_json@313:v.widget.text.alignment>': ['center'],
'<from_json@313:v.widget.text.data>': ['Click Here'],
'<from_json@313:v.widget.text.size>': ['36'],
'<from_json@313:v.widget.text.style>': ['bold'],
'<from_json@313:v.widget.text.name>': ['text1'],
'<from_json@313:v.widget.text.voffset>': ['100'],
'<from_json@313:v.widget.text.onmouseup>': ['sun1.opacity = (sun1.opacity / 100) * 90;']}
xxxxxxxxxx
if 'microjson' in CHECK:
result = check_precision('microjson.py', autogram_microjson_grammar_t)
Autogram_p['microjson.py'] = result
print(result)
(0, 1000)
xxxxxxxxxx
if 'microjson' in CHECK:
result = check_recall_samples(json_samples_2, autogram_microjson_grammar_t, subjects.microjson.main)
Autogram_r['microjson.py'] = result
print(result)
(0, 100)
Note that we found and fixed a bug in the Information flow chapter of the fuzzingbook, which was causing Autogram to fail (See `flatten` and `ostr_new` in `recover_grammar_with_taints`). This caused the precision numbers of Autogram to improve. However, please see the grammars generated. They are still enumerations.
Note that we found and fixed a bug in the Information flow
chapter of the fuzzingbook, which was causing Autogram to fail (See
flatten and ostr_new in
recover_grammar_with_taints). This caused the
precision numbers of Autogram to improve. However, please see the
grammars generated. They are still enumerations.
xxxxxxxxxx
from IPython.display import HTML, display
xxxxxxxxxx
def show_table(keys, autogram, mimid, title):
keys = [k for k in keys if k in autogram and k in mimid and autogram[k] and mimid[k]]
tbl = ['<tr>%s</tr>' % ''.join(["<th>%s</th>" % k for k in ['<b>%s</b>' % title,'Autogram', 'Mimid']])]
for k in keys:
h_c = "<td>%s</td>" % k
a_c = "<td>%s</td>" % autogram.get(k,('',0))[0]
m_c = "<td>%s</td>" % mimid.get(k,('',0))[0]
tbl.append('<tr>%s</tr>' % ''.join([h_c, a_c, m_c]))
return display(HTML('<table>%s</table>' % '\n'.join(tbl)))
xxxxxxxxxx
def to_sec(hm):
return {k:((hm[k][1]).seconds, ' ') for k in hm if hm[k]}
xxxxxxxxxx
Autogram_t
{'calculator.py': (552209, datetime.timedelta(seconds=6, microseconds=552209)),
'mathexpr.py': (538847, datetime.timedelta(seconds=26, microseconds=538847)),
'urlparse.py': (997539, datetime.timedelta(seconds=47, microseconds=997539)),
'netrc.py': (167983, datetime.timedelta(seconds=179, microseconds=167983)),
'cgidecode.py': (905319, datetime.timedelta(seconds=24, microseconds=905319)),
'microjson.py': (656220,
datetime.timedelta(seconds=450, microseconds=656220))}
xxxxxxxxxx
Mimid_t
{'calculator.py': (718471, datetime.timedelta(seconds=6, microseconds=718471)),
'mathexpr.py': (21477, datetime.timedelta(seconds=17, microseconds=21477)),
'urlparse.py': (125374, datetime.timedelta(seconds=6, microseconds=125374)),
'netrc.py': (384119, datetime.timedelta(seconds=30, microseconds=384119)),
'cgidecode.py': (833595, datetime.timedelta(seconds=31, microseconds=833595)),
'microjson.py': (952269,
datetime.timedelta(seconds=407, microseconds=952269))}
xxxxxxxxxx
show_table(Autogram_t.keys(), to_sec(Autogram_t), to_sec(Mimid_t), 'Timing')
| Timing | Autogram | Mimid |
|---|---|---|
| calculator.py | 6 | 6 |
| mathexpr.py | 26 | 17 |
| urlparse.py | 47 | 6 |
| netrc.py | 179 | 30 |
| cgidecode.py | 24 | 31 |
| microjson.py | 450 | 407 |
### Table III (Precision)
How many inputs we generate using our inferred grammar is valid? (accepted by the subject program?)
Note that the paper reports precision per 100 inputs. We have increased the count to 1000.
How many inputs we generate using our inferred grammar is valid? (accepted by the subject program?)
Note that the paper reports precision per 100 inputs. We have increased the count to 1000.
xxxxxxxxxx
Autogram_p
{'calculator.py': (395, 1000),
'mathexpr.py': (301, 1000),
'urlparse.py': (1000, 1000),
'netrc.py': (30, 1000),
'cgidecode.py': (460, 1000),
'microjson.py': (0, 1000)}
xxxxxxxxxx
Mimid_p
{'calculator.py': (1000, 1000),
'mathexpr.py': (699, 1000),
'urlparse.py': (1000, 1000),
'netrc.py': (773, 1000),
'cgidecode.py': (1000, 1000),
'microjson.py': (924, 1000)}
xxxxxxxxxx
show_table(Autogram_p.keys(), Autogram_p, Mimid_p, 'Precision')
| Precision | Autogram | Mimid |
|---|---|---|
| calculator.py | 395 | 1000 |
| mathexpr.py | 301 | 699 |
| urlparse.py | 1000 | 1000 |
| netrc.py | 30 | 773 |
| cgidecode.py | 460 | 1000 |
| microjson.py | 0 | 924 |
### Table IV (Recall)
How many *valid* inputs generated by the golden grammar or collected externally are parsable by a parser using our grammar?
Note that the recall is reported per 100 inputs in paper. We have increased the count to 1000. For Microjson, the recall numbers are based on 100 realworld documents. These are available in json.tar.gz that is bundled along with this notebook.
How many valid inputs generated by the golden grammar or collected externally are parsable by a parser using our grammar?
Note that the recall is reported per 100 inputs in paper. We have increased the count to 1000. For Microjson, the recall numbers are based on 100 realworld documents. These are available in json.tar.gz that is bundled along with this notebook.
xxxxxxxxxx
Autogram_r
{'calculator.py': (1, 1000),
'mathexpr.py': (0, 1000),
'urlparse.py': (277, 1000),
'netrc.py': (773, 1000),
'cgidecode.py': (380, 1000),
'microjson.py': (0, 100)}
xxxxxxxxxx
Mimid_r
{'calculator.py': (1000, 1000),
'mathexpr.py': (922, 1000),
'urlparse.py': (153, 1000),
'netrc.py': (949, 1000),
'cgidecode.py': (1000, 1000),
'microjson.py': (93, 100)}
xxxxxxxxxx
show_table(Autogram_p.keys(), Autogram_r, Mimid_r, 'Recall')
| Recall | Autogram | Mimid |
|---|---|---|
| calculator.py | 1 | 1000 |
| mathexpr.py | 0 | 922 |
| urlparse.py | 277 | 153 |
| netrc.py | 773 | 949 |
| cgidecode.py | 380 | 1000 |
| microjson.py | 0 | 93 |
xxxxxxxxxx
%%var calc_rec_src
import string
def is_digit(i):
return i in list(string.digits)
def parse_num(s,i):
while s[i:] and is_digit(s[i]):
i = i +1
return i
def parse_paren(s, i):
assert s[i] == '('
i = parse_expr(s, i+1)
if s[i:] == '':
raise Exception(s, i)
assert s[i] == ')'
return i+1
def parse_expr(s, i = 0):
expr = []
is_op = True
while s[i:] != '':
c = s[i]
if c in list(string.digits):
if not is_op: raise Exception(s,i)
i = parse_num(s,i)
is_op = False
elif c in ['+', '-', '*', '/']:
if is_op: raise Exception(s,i)
is_op = True
i = i + 1
elif c == '(':
if not is_op: raise Exception(s,i)
i = parse_paren(s, i)
is_op = False
elif c == ')':
break
else:
raise Exception(s,i)
if is_op:
raise Exception(s,i)
return i
def main(arg):
parse_expr(arg)
xxxxxxxxxx
calc_rec_grammar = accio_grammar('cal.py', VARS['calc_rec_src'], calc_samples)
xxxxxxxxxx
calc_rec_grammar
{'<START>': ['<parse_expr-0-c>'],
'<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-2-s><parse_expr-1>'],
'<parse_expr-1>': ['(<parse_expr-0-c>)', '<parse_num-1-s>'],
'<parse_expr-2-s>': ['<parse_expr-1><parse_expr>',
'<parse_expr-1><parse_expr><parse_expr-2-s>'],
'<parse_num-1-s>': ['<is_digit-0-c>', '<is_digit-0-c><parse_num-1-s>'],
'<is_digit-0-c>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
'<parse_expr>': ['*', '+', '-', '/']}
xxxxxxxxxx
%%var myparsec_src
# From https://github.com/xmonader/pyparsec
from functools import reduce
import string
flatten = lambda l: [item for sublist in l for item in (sublist if isinstance(sublist, list) else [sublist] )]
class Maybe:
pass
class Just(Maybe):
def __init__(self, val):
self.val = val
def __str__(self):
return "<Just %s>"%str(self.val)
class Nothing(Maybe):
_instance = None
def __new__(class_, *args, **kwargs):
if not isinstance(class_._instance, class_):
class_._instance = object.__new__(class_, *args, **kwargs)
return class_._instance
def __str__(self):
return "<Nothing>"
class Either:
pass
class Left:
def __init__(self, errmsg):
self.errmsg = errmsg
def __str__(self):
return "(Left %s)"%self.errmsg
__repr__ = __str__
def map(self, f):
return self
class Right:
def __init__(self, val):
self.val = val
def unwrap(self):
return self.val
@property
def val0(self):
if isinstance(self.val[0], list):
return flatten(self.val[0])
else:
return [self.val[0]]
def __str__(self):
return "(Right %s)"% str(self.val)
__repr__ = __str__
def map(self, f):
return Right( (f(self.val0), self.val[1]))
class Parser:
def __init__(self, f, tag=''):
self.f = f
self._suppressed = False
self.tag = tag
def parse(self, *args, **kwargs):
return self.f(*args, **kwargs)
__call__ = parse
def __rshift__(self, rparser):
return and_then(self, rparser)
def __lshift__(self, rparser):
return and_then(self, rparser)
def __or__(self, rparser):
return or_else(self, rparser)
def map(self, transformer):
return Parser(lambda *args, **kwargs: self.f(*args, **kwargs).map(transformer), self.tag)
def __mul__(self, times):
return n(self, times)
set_action = map
def suppress(self):
self._suppressed = True
return self
def pure(x):
def curried(s):
return Right((x, s))
return Parser(curried, 'pure')
def ap(p1, p2):
def curried(s):
res = p2(s)
return p1(*res.val[0])
return curried
def compose(p1, p2):
def newf(*args, **kwargs):
return p2(p1(*args, **kwargs))
return newf
def run_parser(p, inp):
return p(inp)
def _isokval(v):
if isinstance(v, str) and not v.strip():
return False
if isinstance(v, list) and v and v[0] == "":
return False
return True
def and_then(p1, p2):
def curried(s):
res1 = p1(s)
if isinstance(res1, Left):
return res1
else:
res2 = p2(res1.val[1]) # parse remaining chars.
if isinstance(res2, Right):
v1 = res1.val0
v2 = res2.val0
vs = []
if not p1._suppressed and _isokval(v1):
vs += v1
if not p2._suppressed and _isokval(v2):
vs += v2
return Right( (vs, res2.val[1]))
return res2
return Parser(curried, 'and_then')
def n(parser, count):
def curried(s):
fullparsed = ""
for i in range(count):
res = parser(s)
if isinstance(res, Left):
return res
else:
parsed, remaining = res.unwrap()
s = remaining
fullparsed += parsed
return Right((fullparsed, s))
return Parser(curried, 'n')
def or_else(p1, p2):
def curried(s):
res = p1(s)
if isinstance(res, Right):
return res
else:
res = p2(s)
if isinstance(res, Right):
return res
else:
return Left("Failed at both")
return Parser(curried, 'or_else')
def char(c):
def curried(s):
if not s:
msg = "S is empty"
return Left(msg)
else:
if s[0] == c:
return Right((c, s[1:]) )
else:
return Left("Expecting '%s' and found '%s'"%(c, s[0]))
return Parser(curried, 'char')
foldl = reduce
def choice(parsers):
return foldl(or_else, parsers)
def any_of(chars):
return choice(list(map(char, chars)))
def parse_string(s):
return foldl(and_then, list(map(char, list(s)))).map(lambda l: "".join(l))
def until_seq(seq):
def curried(s):
if not s:
msg = "S is empty"
return Left(msg)
else:
if seq == s[:len(seq)]:
return Right(("", s))
else:
return Left("Expecting '%s' and found '%s'"%(seq, s[:len(seq)]))
return Parser(curried, 'until_seq')
def until(p):
def curried(s):
res = p(s)
if isinstance(res, Left):
return res
else:
return Right(("", s))
return Parser(curried, 'until')
chars = parse_string
def parse_zero_or_more(parser, inp): #zero or more
res = parser(inp)
if isinstance(res, Left):
return "", inp
else:
firstval, restinpafterfirst = res.val
subseqvals, remaining = parse_zero_or_more(parser, restinpafterfirst)
values = firstval
if subseqvals:
if isinstance(firstval, str):
values = firstval+subseqvals
elif isinstance(firstval, list):
values = firstval+ ([subseqvals] if isinstance(subseqvals, str) else subseqvals)
return values, remaining
def many(parser):
def curried(s):
return Right(parse_zero_or_more(parser,s))
return Parser(curried, 'many')
def many1(parser):
def curried(s):
res = run_parser(parser, s)
if isinstance(res, Left):
return res
else:
return run_parser(many(parser), s)
return Parser(curried, 'many1')
def optionally(parser):
noneparser = Parser(lambda x: Right( (Nothing(), "")))
return or_else(parser, noneparser)
def sep_by1(sep, parser):
sep_then_parser = sep >> parser
return parser >> many(sep_then_parser)
def sep_by(sep, parser):
return (sep_by1(sep, parser) | Parser(lambda x: Right( ([], "")), 'sep_by'))
def forward(parsergeneratorfn):
def curried(s):
return parsergeneratorfn()(s)
return curried
letter = any_of(string.ascii_letters)
letter.tag = 'letter'
lletter = any_of(string.ascii_lowercase)
lletter.tag = 'lletter'
uletter = any_of(string.ascii_uppercase)
uletter.tag = 'uletter'
digit = any_of(string.digits)
digit.tag = 'digit'
digits = many1(digit)
digits.tag = 'digits'
whitespace = any_of(string.whitespace)
whitespace.tag = 'whitespace'
ws = whitespace.suppress()
ws.tag = 'ws'
letters = many1(letter)
letters.tag = 'letters'
word = letters
word.tag = 'word'
alphanumword = many(letter >> (letters|digits))
alphanumword.tag = 'alphanumword'
num_as_int = digits.map(lambda l: int("".join(l)))
num_as_int.tag = 'num_as_int'
between = lambda p1, p2 , p3 : p1 >> p2 >> p3
surrounded_by = lambda surparser, contentparser: surparser >> contentparser >> surparser
quotedword = surrounded_by( (char('"')|char("'")).suppress() , word)
quotedword.tag = 'quotedword'
option = optionally
option.tag = 'optionally'
# commasepareted_p = sep_by(char(",").suppress(), many1(word) | many1(digit) | many1(quotedword))
commaseparated_of = lambda p: sep_by(char(",").suppress(), many(p))
xxxxxxxxxx
with open('build/myparsec.py', 'w+') as f:
src = rewrite(VARS['myparsec_src'], original='myparsec.py')
print(src, file=f)
xxxxxxxxxx
%%var parsec_src
import string
import json
import sys
import myparsec as pyparsec
alphap = pyparsec.char('a')
alphap.tag = 'alphap'
eqp = pyparsec.char('=')
eqp.tag = 'eqp'
digitp = pyparsec.digits
digitp.tag = 'digitp'
abcparser = alphap >> eqp >> digitp
abcparser.tag = 'abcparser'
def main(arg):
abcparser.parse(arg)
xxxxxxxxxx
parsec_samples = [
'a=0'
]
xxxxxxxxxx
def accio_tree(fname, src, samples, restrict=True):
program_src[fname] = src
with open('subjects/%s' % fname, 'w+') as f:
print(src, file=f)
resrc = rewrite(src, fname)
if restrict:
resrc = resrc.replace('restrict = {\'files\': [sys.argv[0]]}', 'restrict = {}')
with open('build/%s' % fname, 'w+') as f:
print(resrc, file=f)
os.makedirs('samples/%s' % fname, exist_ok=True)
sample_files = {("samples/%s/%d.csv"%(fname,i)):s for i,s in enumerate(samples)}
for k in sample_files:
with open(k, 'w+') as f:
print(sample_files[k], file=f)
call_trace = []
for i in sample_files:
my_tree = do(["python", "./build/%s" % fname, i]).stdout
call_trace.append(json.loads(my_tree)[0])
mined_tree = miner(call_trace)
generalized_tree = generalize_iter(mined_tree)
return generalized_tree
xxxxxxxxxx
parsec_trees = accio_tree('parsec.py', VARS['parsec_src'], parsec_samples)
xxxxxxxxxx
zoom(display_tree(parsec_trees[0]['tree'], extract_node=extract_node_o))
xxxxxxxxxx
%%var peg_src
import re
RE_NONTERMINAL = re.compile(r'(<[^<> ]*>)')
def canonical(grammar, letters=False):
def split(expansion):
if isinstance(expansion, tuple): expansion = expansion[0]
return [token for token in re.split(RE_NONTERMINAL, expansion) if token]
def tokenize(word): return list(word) if letters else [word]
def canonical_expr(expression):
return [token for word in split(expression)
for token in ([word] if word in grammar else tokenize(word))]
return {k: [canonical_expr(expression) for expression in alternatives]
for k, alternatives in grammar.items()}
def crange(character_start, character_end):
return [chr(i) for i in range(ord(character_start), ord(character_end) + 1)]
def unify_key(grammar, key, text, at=0):
if key not in grammar:
if text[at:].startswith(key):
return at + len(key), (key, [])
else:
return at, None
for rule in grammar[key]:
to, res = unify_rule(grammar, rule, text, at)
if res:
return (to, (key, res))
return 0, None
def unify_rule(grammar, rule, text, at):
results = []
for token in rule:
at, res = unify_key(grammar, token, text, at)
if res is None:
return at, None
results.append(res)
return at, results
import string
VAR_GRAMMAR = {
'<start>': ['<assignment>'],
'<assignment>': ['<identifier>=<expr>'],
'<identifier>': ['<word>'],
'<word>': ['<alpha><word>', '<alpha>'],
'<alpha>': list(string.ascii_letters),
'<expr>': ['<term>+<expr>', '<term>-<expr>', '<term>'],
'<term>': ['<factor>*<term>', '<factor>/<term>', '<factor>'],
'<factor>':
['+<factor>', '-<factor>', '(<expr>)', '<identifier>', '<number>'],
'<number>': ['<integer>.<integer>', '<integer>'],
'<integer>': ['<digit><integer>', '<digit>'],
'<digit>': crange('0', '9')
}
def main(arg):
C_VG = canonical(VAR_GRAMMAR)
unify_key(C_VG, '<start>', arg)
xxxxxxxxxx
peg_samples = [
'a=0',
]
xxxxxxxxxx
peg_trees = accio_tree('peg.py', VARS['peg_src'], peg_samples, False)
xxxxxxxxxx
zoom(display_tree(peg_trees[0]['tree'], extract_node=extract_node_o))
